Hacker News new | ask | show | jobs
by pdimitar 556 days ago
I agree with your take, it's just that many programmers want to easily jump from "byte array" to "string in XYZ encoding". I personally prefer byte arrays for unsafe data and to do deserialization in application code.
1 comments

In other words, considering we are talking about string and unicode...

There are two types of programmers, those that are wrong and those that are very wrong

lol. :)

Funny but not entirely true. I had cases when we had to urgently store a firehose of data and figure out the right string encoding later. Just dumping the strings with uncertain encoding in `bytea` columns helped us there.

Plus for some fields it helps with auditability f.ex. when you get raw binary-encoded telemetry from devices in the field, you should store their raw payloads _and_ the parsed data structures that you got from them. Being this paranoid has saved my neck a few times.

The secret is to accept you are not without fault and take measures to be able to correct yourself in the future.

Indeed, one system I dealt with used char instead of blob. The text as stored was riddled with U+FFFE (unicode unknown character).