Hacker News new | ask | show | jobs
by davidism 3837 days ago
> Although that way may not be obvious at first unless you're Dutch.

There is one obvious way: strings are encoded to bytes, bytes are decoded to strings.

3 comments

I don't think it is obvious and intuitive and unambiguous what the difference is between ENcoding and DEcoding.
A string is an abstract bit of text. You need to encode this into a particular memory representation of the text.

Bytes hold a bunch of data in some encoding. It could be an image, UTF-8 or LZMA compressed ASCII. Once you know the encoding, to reconstruct the data you decode into a semantically meaningful form.

To put it another way, imagine the terms were "serialize" and "deserialize". Of course one serializes to and deserializes from binary data. Just replace "{,de}serialize" with "{en,de}code" and you're done.

You mean the prefixes 'en-' and 'de-'?
Encode and Decode are slightly subjective... why not something like to_bytes and from_bytes? Maybe not the best names, but definitely clearer on the meaning.
Not really.

Veedrac had a good analogy, think of text as something abstract, for example imagine text is an image or sound, if you want to store it in bytes you need to encode it, and to read back you decode it.

As to_bytes/from_bytes, actually python provides it too:

to_bytes -> bytes(<text>)

from_bytes -> str(<bytes>)

It makes sense for sure, just isn't super intuitive - if it was people wouldn't be so confused.
I think that's backwards
It's not backwards.

I think that reveals that the names really do have a problem. The problem is that "encode" sounds like "make this Unicode" to people who aren't familiar with Unicode.