Hacker News new | ask | show | jobs
by larsnystrom 551 days ago
There are two kinds of programmers: Those who think of strings as text, and those who think of strings as a sequence of bytes. The second group doesn’t care about the special case where a byte is all zeroes.
1 comments

In that second case the string is better represented as "bytea", which has most (but not all) of the features of the "text" type.
I agree with your take, it's just that many programmers want to easily jump from "byte array" to "string in XYZ encoding". I personally prefer byte arrays for unsafe data and to do deserialization in application code.
In other words, considering we are talking about string and unicode...

There are two types of programmers, those that are wrong and those that are very wrong

lol. :)

Funny but not entirely true. I had cases when we had to urgently store a firehose of data and figure out the right string encoding later. Just dumping the strings with uncertain encoding in `bytea` columns helped us there.

Plus for some fields it helps with auditability f.ex. when you get raw binary-encoded telemetry from devices in the field, you should store their raw payloads _and_ the parsed data structures that you got from them. Being this paranoid has saved my neck a few times.

The secret is to accept you are not without fault and take measures to be able to correct yourself in the future.

Indeed, one system I dealt with used char instead of blob. The text as stored was riddled with U+FFFE (unicode unknown character).