Hacker News new | ask | show | jobs
by untitaker_ 1330 days ago
What is really annoying is that Cloudflare pages will strip the file extension off of your html pages, and perform permanent redirects to those new URLs. Now if you're intending on moving to a new hosting service that doesn't do that (cgi-bin), all Google search results to your site will 404.

Github pages and likely others also supports this format, so moving between those two services doesn't exhibit this problem. Moving to cgi-bin will.

I'd suggest Cloudflare shouldn't try to establish their scheme as canonical url and rather implement github pages behavior, but what do I know... I'm just hosting an old fashioned blog, not a JAMstack/SPA/whatever thing

5 comments

Having .html file extensions is very old school, and its removal is one of the popular/default redirects on all http servers that support it. Site generators also hide it through various methods (e.g. having a folder with the path name with just an index.html where a dynamic redirect isn't possible).
It was removed explicitly for SEO and having a single canonical URL for a resource _for all of time forever_. Right now we use HTML for web pages but who knows what the internet 20, 50, 100, etc. years from now will use--maybe .Super-Awesome-Mega-HTML is all the rage. If over time you are changing your site and its URLs are changing then you're breaking that canonical URL and search indexes, caches, way back machine, etc. all suffer. So the intent is don't make the format of the page (HTML) part of it's canonical URL.
In case anyone hasn't read it, not including the extension is covered in more detail in "Cool URIs don't change" https://www.w3.org/Provider/Style/URI
The benefits aren’t just SEO. I’d much rather have /about than /about.{htm,html,php,asp,etc}. I don’t see how the latter is preferable for routing to pages.
I second this. That's what mime-type headers are for. The URL should locate a resource, as the name suggests, not necessarily convey metadata about what the resource is.
Site generators hiding it is not a problem for portability. Supporting that redirect is also not a problem! Permanent redirects are.

Also, old-school is still the dominant way with how small sites are hosted, in terms of number of hosting providers who offer cgi hosting.

Yes. New school's defining feature is needless layering of complexity to hide the simple truth of the file system from users.
Maybe this is just me, but I like that I can decouple the way I organize my file system from the way users access my site. And with that comes the ability to get rid of file extensions and make urls more human-friendly. Remember, not every person has file extensions turned on by default on their Windows File Explorer!
Is there a reason why someone visiting your website should know or care about file extensions?
The user presumably wants to know what kind of content they're going to receive from a given endpoint - foo.html, foo.pdf, foo.jpg, foo.mp3 and foo.avi suggest quite different experiences, and it's nice to include that hint in the URL (where it's visible on mouseover) rather than the user having to go in blind. I also like being able to reassure the user that they're receiving the same piece of content however they access a given resource, rather than the possibility of invisible content negotiation changing the site's behaviour.
Such hints can be unreliable at best and misleading at worst. There is nothing to guarantee that the file extension and the Content-Type header will agree, nor is there anything to guarantee that the file name in the URL will match the download file name in the Content-Disposition header.
Well, sure, but you can just... not do that? The <title> tag can be misleading because there's no guarantee that the title matches the content of the page - but the answer to that is to use good titles for your pages, not to avoid using the <title> tag.
I guess no one should know whether a pdf is a pdf. Or even whether it's a .com or a .org domain - the browser should just strip all that confusing stuff away!
When you're serving someone an HTML file are you serving them the exact copy on your file system or do you ever use templates? Do you ever pull info from the database? If so, can you see why this is slightly different from directly serving a static pdf?

Also note that you'll often see a PDF generated on the fly with a long, difficult to parse URL.

Take up your second point with the W3 or whatever, to be honest if tlds weren't so important for phishing and whatnot it would probably be fine. I think some browsers have started doing that anyway. You overestimate how tech savvy the average user is, and by extension you overestimate how much the average user can keep track of all this complexity. Do you think most people have heard of .info or .xyz?

That's what mime-type headers are for. The URL should locate a resource, as the name suggests, not necessarily convey metadata about what the resource is.
My web server is not exposing file system paths to the user, it is deciding on content programmatically and it’s purely an aesthetic coincidence that the url path looks like a file system path.
"Old school vs. new school" arguments should be replaced with "Pros vs Cons". Call me, old schooled.
I exactly ran into this issue when migrating from a different provider to CF pages.

I preferred to strip away the .html extensions anyway, so it was okay in my case. CF should trigger a HTTP 308 for older .html to the new urls automatically.

You can use the more barebones Worker Sites and retain file extensions. Pages has nicer UX (automatic PR preview sites!) but the old school Worker Sites setup is dead simple and does exactly what you want - it's just a template setup to load assets from KV storage.

https://developers.cloudflare.com/workers/platform/sites/

It seems workers sites has different pricing compared to pages. Might want to call that out.
Maybe I'm misunderstanding, but Apache httpd has performed URL rewriting such as adding a .html suffix after mapping URLs to file names where appropriate, and other transformations such as content negotiation for languages, spelling correction, etc. for ages.
Cloudflare automatically redirects any /foo.html URL to /foo, with a permanent redirect. That is certainly not standard behavior, neither on httpd nor gh pages
Can't you use their new Bulk Redirects to fix it?