The Wayback Machine - https://web.archive.org/web/20220125172147/https://github.com/gatsbyjs/gatsby/discussions/34205
Skip to content

Integrated handling of trailing slashes in Gatsby #34205

pragmaticpat started this conversation in RFC
Integrated handling of trailing slashes in Gatsby #34205
Dec 6, 2021 · 7 comments · 9 replies

Testing this feature

2022-JAN-07: Current state of #34268 is now published as a canary as alpha-trailing-slash, so you can try by installing gatsby@alpha-trailing-slash.

The goal

Introduce trailing / non-trailing slash management as a baked-in feature of Gatsby ( no plugins required ).

What's the pain our users face?

  • SEO: If the same content for a site is available via https://my.domain.io/about and https://my.domain.io/about/ and https://my.domain.io/about/index.html, which is possible to produce today with a Gatsby site, then Gatsby users suffer from reduced SEO since the search engine crawler will pick one of those URL's as the canonical URL. This can weaken the search ranking of the related content since it's limited to the traffic that goes to one of the possible URL's.
  • Reporting: If the same content is available in multiple, then content editors and marketing teams are forced to perform some elaborate URL aggregation in their reporting tools to get a clear understanding about the traffic for each piece of content on their sites

How do users work around this today?

  1. Redirects: Gatsby users can get around these problems via redirects, but redirects should be reserved for true path-level redirects ( e.g. redirect "/about/" to "/company/" ), not merely variations of the same path ( e.g. redirect "/about" to "/about/" ).
  2. Plugins: Users also attempt to use gatsby-plugin-force-trailing-slashes and gatsby-plugin-remove-trailing-slashes to get the desired behavior, but are often left confused about how to achieve the desired outcome. See here for the depth of discussion that has occurred, to-date on the topic.

Gatsby users are looking for consistent treatment of URL's for their sites. As of November 2021, Gatsby only provides "forcing URL's to be a particular format" via Plugins ( which means, users must jump through a few hurdles to get consistent URL's in their sites).


Okay, so what do we do about this?

There are the following flavors of treatment that have been written about extensively in our open source project by community members, and by Gatsby employees alike:

  • Force all URL's to include a trailing slash ( instead of "/about", "/about/" )
  • Force all URL's to exclude a trailing slash ( instead of "/about/", "/about" )

Redirects, and expected behavior from the hosting provider ( e.g. Gatsby Cloud )

added to RFC on 2022-Jan-07

To ensure consistency within a given Gatsby site, as well as consistency across the various modes of running your Gatsby site ( e.g. develop, serve, and production ) hosting providers shall apply standard redirect behavior for sites that leverage the new trailingSlash option.

The following table not only specifies the redirect scenarios that will occur within gatsby develop, gatsby serve, and on Gatsby Cloud, but also specifies how other hosting providers should apply these redirects, given the settings specified for a given Gatsby site via gatsby-config.js.

Consider a page that exists at https://example.org/about/index.html

Scenario trailingSlash value Requested url Redirect (hosting layer) Rendered url
1 Ignore / not specified /about none /about ( see )
2 Ignore / not specified /about/ none /about/
3 Ignore / not specified /about/index.html none about/ ( see )
4 always /about 301 /about/
5 always /about/ none /about/
6 always /about/index.html none /about/
7 never /about none /about
8 never /about/ 301 /about
9 never /about/index.html none /about

Gatsby Cloud searches for an appropriate path, and will render the corresponding page if found. It’s highly unlikely that the desired experience is to 404 a visitor when they attempt to access a resource with a different semantic than is configured ( e.g. attempting to access /about instead of /about/ should not 404 the visitor! ). Therefore, Gatsby Cloud makes additional attempts to get the visitor to the desired resource.

Why use a 301 redirect? Since the primary use case is that the website owners want all pages to follow a consistent semantic, this is not truly a temporary redirect. Therefore, a 301 seems most appropriate as the applied redirect when the website owners specify trailingSlash as either always or never.

While the techniques of implementing the Redirect and Rendered url may vary among providers, the outcome for the site owners and site visitors shall be consistent with the specification above.

What are we not including?

  • Unfurl pages ( instead of "/about/" or "/about", "/about/index.html")

Approach

This feature will be purely additive, and the default behavior will be as Gatsby behaves today ( no opinion on whether you use trailing / non-trailing slashes ).

  1. Introduce the option to force trailing slashes for all routes (including client routes) likely through a config setting in gatsby-config.js. API example:

    // gatsby-config.js
    module.exports = {
    	trailingSlash: "always"
    }

    The trailingSlash option would take three options: never, always, and ignore. never removes all trailing slashes, always adds it, and ignore doesn't automatically change anything and it's in user hands to keep things consistent. (Credit: SvelteKit has the same API)

  2. Upon sufficient adoption, and critical bugs are addressed, we remove the flag and make this part of standard Gatsby behavior

  3. Users can remove the corresponding force / remove trailing slash plugins from their Gatsby implementations

Replies

7 comments
·
9 replies

Ah great! Love to see this. Thanks for writing this up @pragmaticpat! I enjoyed our discussion on it some months ago and think this is a great framework to go off of. It's a hard but important issue to tackle given how much it touches!

and ignore doesn't automatically change anything and it's in user hands to keep things consistent

I love this third option in the API and I think it's exactly necessary for those sites that come from WordPress migrations, have a mix of trailing and non-trailing page variants, and other out-of-the-box builds. Great call out.

Would love to get my hands dirty on this one once it fulfills the RFC phase. Thanks again!

0 replies

pragmaticpat
Dec 14, 2021
Maintainer Author

As mentioned in the initial RFC post above, we drew some inspiration from Sveltekit's API for how we'd present the option to users. A nuance here is that we don't explicitly call out the presence of automatic redirects when using the trailingSlash setting.

Saying it another way, given the RFC in its current state, it might be possible for a visitor to 404 when requesting /about/ when trailingSlash: never unless the user has also added the corresponding redirects. I question whether this is a desirable outcome.

For trailingSlash handling to be more "feature complete," perhaps we need to include some form of automated redirects so visitors don't 404 when they directly access a route as I described above ( understanding that some of this really would be best served in the chosen hosting layer ).

Prompt to the followers of this RFC

  1. If we automatically handled/generated redirects in some way, do you think that's the right amount / too much "magic"?
  2. Does it feel like some level of redirect automation is essential for this to be more "feature complete" from the user's perspective?

Thanks all!

0 replies

You can follow the progress here.

0 replies

Hey team 👋

I read through everything a few times and wanted to add my 2c.

but redirects should be reserved for true path-level redirects ( e.g. redirect "/about/" to "/company/" ), not merely variations of the same path ( e.g. redirect "/about" to "/about/" ).

Totally agree with this and love the sentiment.

Saying it another way, given the RFC in its current state, it might be possible for a visitor to 404 when requesting /about/ when trailingSlash: never unless the user has also added the corresponding redirects. I question whether this is a desirable outcome.

For trailingSlash handling to be more "feature complete," perhaps we need to include some form of automated redirects so visitors don't 404 when they directly access a route as I described above ( understanding that some of this really would be best served in the chosen hosting layer ).

I share your suspicion around how the outcome should feel for the developer — using 'Always' or 'Never' should be a pleasant experience. Classically, the "ah, it just works" moment would be the best, right? That said, I fervently agree that this should be handled in the hosting / server layer. I think setting up some kind of application-layer redirects to compensate for hosting-layer misconfigurations is a rabbit hole full of confusion and bad times 😅. I'd suggest that instead of that sort of magic we instead increase the amount of documentation around "how your selected web-server should behave". And that's not necessarily super a complex thing:

Your web-server should serve a file at path ~/public/foo-bar/index.html at example.com/foo-bar/ and should (server-level) redirect (301/302) a request to example.com/foo-bar to example.com/foo-bar/, easily checkable from a cURL or HTTPie request in your terminal.

If your web-server is serving the page content at both example.com/foo-bar and example.com/foo-bar/, you need to adjust your hosting preferences such that named directories that contain an index.html file are served as a trailing-slash-path for that directory and requests to that path without the trailing slash are redirected to the trailing slash variant.

For what it's worth, to my knowledge this is actually the default setup for every web-server I've seen (directories vs. documents is the history of the web) but for whatever reason these defaults get changed or adjusted by vendors or individuals etc. etc.


That all said, I think that particular thing (server trailing-slash / directory-document configuration), while outside of what Gatsby can actually control or deal with, is really important to making this whole project work.

As an example, if a Gatsby user sets up Gatsby (one way or another) so that Gatsby produces 'named index directories' (e.g. the resulting file is ~/public/my-path-name/index.html rather than ~/public/my-path-name.html), ensures that all calls to createPage() include a trailing slash in the path string to properly prep the Reach Router for trailing-slash-hydrations, and always passes a trailing-slash path to any usages of <Link> or other PWA-navigations (this 3-step setup is probably what this RFC's implementation would be for the always directive) the user could still run into problems if the web-server serves that file (~/public/my-path-name/index.html) at both example.com/my-path-name/ and example.com/my-path-name.

In the case where an end-user hits example.com/my-path-name/, great! All is well. In the case where the end-user hits example.com/my-path-name the web-server would serve the file and we'd see the weird flashing action so many have referenced. The file is statically rendered so should visually show the page to the user but once the Reach Router hydrates it realizes that it's not on the path it was configured for (that path had a trailing slash) so it flashes the page and moves the user over to the trailing slash variant. The result is both annoying (page flashing when cold-loading any page on the site [though subsequent PWA navigations should be okay]) and still wrong: the web-server is still serving (Gatsby-created) content on both the trailing and non-trailing URL path variants.

Obviously that's not Gatsby's fault — the 1,2,3 step controls are all Gatsby can do.. but that doesn't fix the issue or help the dev 😕.


Outside of adding docs around how the web-server should be setup in conjunction with using the always/never directives in this new RFC, the only other thing I could think of is perhaps adding HTML canonical tags to each page to represent the correct URL (pursuant to this RFC's flag) to compensate for possibly bad web-server setups. In those cases (bad web-server setup) the flashing / url-jumping could still happen but at least the SEO ought to be fully okay. Not a perfect sell, not a fix, but maybe a good second line of defense.


Finally, I think I know the answer but I just want to call out some behavior that may result of this RFC. If the developer has the directive always set but in their Gatsby-node.js (or otherwise) attempts to call createPage() and pass in a path of /static/example.html — explicitly passing in the .html filename, that will sort of bonk out and actually end up creating a file like ~/public/static/example.html/index.html? Meaning that the developer's intention of having a non-trailing-slash page even though the RFC directive was set to always is not possible, correct? I don't think it should be (and things would start getting pretty complex) but I just wanted to call it out. When it's always or never attempting to override that on specific pages is not supported and would make weird results.

E.g. the 404 page is basically the only page that lives outside of the trailing slash requirements (which is true today too) since it needs to be a document at ~/public/404.html

2 replies
@pragmaticpat

@jon-sully - as you'll see in the RFC, we've updated the spec to include expected behavior from hosting providers. This means that we'll be ensuring consistency from the setting ( trailingSlash ) specified by the developer all the way up through to the hosting layer.

Also, you'll see a couple of other details in the updated RFC:

  1. We don't attempt to perform redirects when index.html is directly requested. The page will simply be served.
  2. We are proposing the redirect defaults to a 301, since this use case is not for temporary redirection.

We'd love to hear your (and other followers of this RFC, of course!) thoughts on this as our default behavior

@jon-sully

Very cool! All of the changes as of today (Jan-7-21) look awesome. Looks like a great opportunity for Gatsby Cloud to sneak in a couple of great 'it just works' features with the bits there — nice!

Really looking forward to playing with this as it progresses through its development phases. Thank you for all the hard work and reading my excessively long posts @pragmaticpat (and @LekoArts for some great dev!)! You guys rock.

I'd like to share that we have our first canary available for testing! 🚀 Thank you @LekoArts for leading the charge on this one.

0 replies

pragmaticpat
Jan 10, 2022
Maintainer Author

Taking the canary through its paces. A couple of items to note ( some of which we've already discussed internally, but in the spirit of transparency! ... )

  1. gatsby build && gatsby serve applies the desired trailing / non-trailing settings
  2. gatsby develop doesn't yet operate according to the trailingSlash configuration ( known, in process )
  3. 🤔 The rootpath seems to continue to apply a trailing slash in the sitemap (using gatsby-plugin-sitemap) and in the browser even when specifying trailingSlash: never

I've asked @LekoArts about the third item above.

Any other findings from folks that have tried this feature yet?

2 replies
@LekoArts

That the root path is having the trailing slash is perfectly fine :) If you go to e.g. https://github.com and log document.location.href in the console, you'll get a URL with a trailing slash.

Google says in their post https://developers.google.com/search/blog/2010/04/to-slash-or-not-to-slash

Rest assured that for your root URL specifically, http://example.com is equivalent to http://example.com/ and can't be redirected even if you're Chuck Norris.

@LekoArts

We check if a path is / and then just pass it through, at those stages we don't know the final URL yet and thus can't change those root URLs. But it having this trailing slash it luckily not an issue 👍

Hey, I probably come late to the party but you'll probably find this resource helpful :)

https://github.com/slorber/trailing-slash-guide

For what it's worth, we also have 3 values in Docusaurus and so far it seems to suit everyone's need for all hosting providers: https://docusaurus.io/docs/deployment#trailing-slashes

Note, for trailingSlash: 'never' you'd rather output about.html instead of /about/index.html

5 replies
@pragmaticpat

@slorber - to be specific - you're referencing the following scenario in the RFC, is that correct?

Scenario trailingSlash value Requested url Redirect (hosting layer) Rendered url
9 never /about/index.html none /about

What information leads you suggest the behavior you recommended in your comment above?

Are you referring to the content here?

@pragmaticpat

For each page in a Gatsby site, Gatsby generates a directory and an index.html within it, so the recommendation in the RFC will remain for scenario 9 at this point.

@slorber - If I missed your point (possible!) - please further explain. Thanks!

@slorber

@pragmaticpat please look at the matrix on this page: https://github.com/slorber/trailing-slash-guide

image

If you still output /xyz/index.html for trailingSlash: 'never', an host like GitHub pages will redirect from /xyz to /xyz/, and there's no config option on GitHub page to prevent that.

See it for yourself live: https://slorber.github.io/trailing-slash-guide/folder

For sure you can still fix the URL client-side, but you will have URL flickering from /xyz to /xyz to /xyz, and I don't think the redirect would be good for SEO.

In this case, I believe it is better to output /xyz.html. This is what we do with Docusaurus and I believe this is the only way to support GitHub pages without trailing slashes reliably


Keep in mind that each host serves the HTML files differently (cf my repo above).

I believe the goal of Gatsby is still to avoid vendor lock-in to Gatsby cloud, and should allow deploying a static website to GitHub pages without trailing slashes in a reliable way.

You can't expect users to update the gatsby setting and things work everywhere by default unfortunately: the user must select a Gatsby setting that works fine for the selected deployment target and its current configuration.

We do so in the docusaurus docs:

image

image

https://docusaurus.io/docs/deployment

@pragmaticpat

Great context again @slorber - we'll think through how best to proceed, including the info you shared above. Much appreciated!

@jon-sully

Thanks for posting this @slorber! I think this helps illustrate the tension I was trying to tease out but couldn't quite draw a picture of — the inherent dependency of Gatsby on the hosting provider's (/ web server's) serving behavior and the necessary thought that needs to go into how/where Gatsby generates the specific named files to help accommodate various providers. Great chart!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
RFC
Labels
None yet
5 participants