Hacker News new | ask | show | jobs
by sofixa 1331 days ago
Having .html file extensions is very old school, and its removal is one of the popular/default redirects on all http servers that support it. Site generators also hide it through various methods (e.g. having a folder with the path name with just an index.html where a dynamic redirect isn't possible).
4 comments

It was removed explicitly for SEO and having a single canonical URL for a resource _for all of time forever_. Right now we use HTML for web pages but who knows what the internet 20, 50, 100, etc. years from now will use--maybe .Super-Awesome-Mega-HTML is all the rage. If over time you are changing your site and its URLs are changing then you're breaking that canonical URL and search indexes, caches, way back machine, etc. all suffer. So the intent is don't make the format of the page (HTML) part of it's canonical URL.
In case anyone hasn't read it, not including the extension is covered in more detail in "Cool URIs don't change" https://www.w3.org/Provider/Style/URI
The benefits aren’t just SEO. I’d much rather have /about than /about.{htm,html,php,asp,etc}. I don’t see how the latter is preferable for routing to pages.
I second this. That's what mime-type headers are for. The URL should locate a resource, as the name suggests, not necessarily convey metadata about what the resource is.
Site generators hiding it is not a problem for portability. Supporting that redirect is also not a problem! Permanent redirects are.

Also, old-school is still the dominant way with how small sites are hosted, in terms of number of hosting providers who offer cgi hosting.

Yes. New school's defining feature is needless layering of complexity to hide the simple truth of the file system from users.
Maybe this is just me, but I like that I can decouple the way I organize my file system from the way users access my site. And with that comes the ability to get rid of file extensions and make urls more human-friendly. Remember, not every person has file extensions turned on by default on their Windows File Explorer!
Is there a reason why someone visiting your website should know or care about file extensions?
The user presumably wants to know what kind of content they're going to receive from a given endpoint - foo.html, foo.pdf, foo.jpg, foo.mp3 and foo.avi suggest quite different experiences, and it's nice to include that hint in the URL (where it's visible on mouseover) rather than the user having to go in blind. I also like being able to reassure the user that they're receiving the same piece of content however they access a given resource, rather than the possibility of invisible content negotiation changing the site's behaviour.
Such hints can be unreliable at best and misleading at worst. There is nothing to guarantee that the file extension and the Content-Type header will agree, nor is there anything to guarantee that the file name in the URL will match the download file name in the Content-Disposition header.
Well, sure, but you can just... not do that? The <title> tag can be misleading because there's no guarantee that the title matches the content of the page - but the answer to that is to use good titles for your pages, not to avoid using the <title> tag.
My point is that it isn’t really reassuring for the user because of invisible negotiations. If anything, I would lean more towards guessing that users are, more often than not, either ignorant or untrusting of URL contents, either because the URLs so frequently look like nonsense (arbitrary content IDs instead of meaningful names) or because they have already proven to be unreliable elsewhere on the web (deep links not as deep as expected when copied or shared).
I guess no one should know whether a pdf is a pdf. Or even whether it's a .com or a .org domain - the browser should just strip all that confusing stuff away!
When you're serving someone an HTML file are you serving them the exact copy on your file system or do you ever use templates? Do you ever pull info from the database? If so, can you see why this is slightly different from directly serving a static pdf?

Also note that you'll often see a PDF generated on the fly with a long, difficult to parse URL.

Take up your second point with the W3 or whatever, to be honest if tlds weren't so important for phishing and whatnot it would probably be fine. I think some browsers have started doing that anyway. You overestimate how tech savvy the average user is, and by extension you overestimate how much the average user can keep track of all this complexity. Do you think most people have heard of .info or .xyz?

That's what mime-type headers are for. The URL should locate a resource, as the name suggests, not necessarily convey metadata about what the resource is.
My web server is not exposing file system paths to the user, it is deciding on content programmatically and it’s purely an aesthetic coincidence that the url path looks like a file system path.
"Old school vs. new school" arguments should be replaced with "Pros vs Cons". Call me, old schooled.