Hacker News new | ask | show | jobs
by sillysaurus3 4269 days ago
So you write FooToUTF8() and UTF8ToFoo(), where Foo is whatever the encoding is in the external format. Done.

As far as I know, UTF-8 will work 100% of the time, and is almost always the best internal representation for software you write due to how simple and uniform it is. If something is encoded in some other format, you can probably find a conversion function online.

1 comments

Okay, so why don't you fork the project, and create your simple Foo/UTF8 methods, and confirm that they are the correct Foo/UTF8 methods for each of the document formats supported.

I'm not saying that it's really all that hard, but there are multiple document formats, and versions of those formats. The author obviously didn't need unicode support, so didn't test for it. I'm sure test cases, and a pull request would be welcome.