| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slavik81 2554 days ago
	ï»¿ is the representation of the UTF-8 BOM byte sequence in Latin-1. If this comment were stored as Latin-1 and you assumed it was UTF-8 just because it began with that byte sequence, you would discard an important part of my message.

1 comments

garaetjjte 2554 days ago

Indeed, Windows Notepad does exactly that (ignores ï»¿ and reads rest as UTF8)

link

slavik81 2554 days ago

I have sympathy for its authors. There is no way to really know what the right encoding is. Your options are to guess based on heuristics, allow the user to specify, or demand a particular format. Even the friendliest applications just guess and allow the user to override.

link