Hacker News new | ask | show | jobs
by tracker1 4265 days ago
But these are file formats that may well not be encoded in UTF-8.. the formats already exist.. it isn't like he's creating a new spreadsheet format here. Some of them may well be encoded to something that works fine against unicode/utf-8, others not so much.
1 comments

So you write FooToUTF8() and UTF8ToFoo(), where Foo is whatever the encoding is in the external format. Done.

As far as I know, UTF-8 will work 100% of the time, and is almost always the best internal representation for software you write due to how simple and uniform it is. If something is encoded in some other format, you can probably find a conversion function online.

Okay, so why don't you fork the project, and create your simple Foo/UTF8 methods, and confirm that they are the correct Foo/UTF8 methods for each of the document formats supported.

I'm not saying that it's really all that hard, but there are multiple document formats, and versions of those formats. The author obviously didn't need unicode support, so didn't test for it. I'm sure test cases, and a pull request would be welcome.