Hacker News new | ask | show | jobs
by slavik81 2508 days ago
 is the representation of the UTF-8 BOM byte sequence in Latin-1. If this comment were stored as Latin-1 and you assumed it was UTF-8 just because it began with that byte sequence, you would discard an important part of my message.
1 comments

Indeed, Windows Notepad does exactly that (ignores  and reads rest as UTF8)
I have sympathy for its authors. There is no way to really know what the right encoding is. Your options are to guess based on heuristics, allow the user to specify, or demand a particular format. Even the friendliest applications just guess and allow the user to override.