Hacker News new | ask | show | jobs
by cjc1083 4968 days ago
On the same token, albiet a bit off of the trail. Does anyone have any suggestions for effectively storing fields which can contain BIG5 (IE non utf-8) chars in them, but usually do not? IE Email subject lines or senders.

JSON is picky in this regard, and I don't want to convert the whole string to B64 etc encode/decode it going in and out, as I would like to retain regex search capability for the 99% of email titles and names which are not Chinese within mongo from my php application which lives on the front.

2 comments

If you need to store non-UTF8 data, MongoDB has a binary data type:

http://php.net/manual/en/class.mongobindata.php

You can't do things like regex searches on binary data, but since MongoDB supports different data types within the same "column", you can just store some as UTF8 and some as binary, depending on whether the string has non-UTF8 characters in it.

Thank you for pointing me in this direction, I'll see if I can make this work in the application I'm building. Thanks again for the reply.
It may be better to encode everything to utf8 before you store it.