Hacker News new | ask | show | jobs
by dschuessler 917 days ago
> <meta charset="UTF-8"> is real. Without this, text encoding sniffing takes places and some browsers just displayed rubbish.

I am curious. Which browsers do this? I thought this tag is unnecessary in HTML5 because the information that the page is UTF-8 encoded is already implicitly conveyed by the DOCTYPE. (UTF-8 is the only encoding allowed for HTML5.)

2 comments

I tried rehosting this file on localhost and removing the <meta charset="UTF-8">, breaks it in the latest versions of Safari, Firefox and Chrome

I tried to fix it by replacing the fake doctype with <!DOCTYPE html>, and it fails in the same way (but the page gets slightly more padding in the top of each page, proving it's doing something to switch modes).

I cannot reproduce this. I used VSCode's live previewer to host the file on localhost (macOS) and tried with the browsers you listed. Removing the meta tag or changing the doctype did not make a difference.

Is it possible that you accidentally deleted the `<plaintext>` tag when making your changes? That breaks the page for me.

I live in Japan, try changing your OS language from English to Japanese - in Japanese a lot of things switch to defaulting to Shift-JIS instead of UTF-8 when nothing is specified.
Didn't need to. When hosting per `python -m http.server 9000` I get the same result as you. Very interesting. Thank you for bringing this to my attention.
Can also try saving the file with a UTF-8 BOM. [1]

BOM > HTTP charset > HTML meta charset

[1] https://www.w3.org/International/questions/qa-html-encoding-...

As you can see, I didn't use that doctype. I experimented with a few older browsers (and one of those online virtual testers) and a few displayed weird results without it.
Ok, thanks for the clarification! I took "Without this, text encoding sniffing takes place" to mean that this happens in general.