| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by davidism 3837 days ago
	> Although that way may not be obvious at first unless you're Dutch. There is one obvious way: strings are encoded to bytes, bytes are decoded to strings.

3 comments

andrewstuart 3837 days ago

I don't think it is obvious and intuitive and unambiguous what the difference is between ENcoding and DEcoding.

link

Veedrac 3837 days ago

A string is an abstract bit of text. You need to encode this into a particular memory representation of the text.

Bytes hold a bunch of data in some encoding. It could be an image, UTF-8 or LZMA compressed ASCII. Once you know the encoding, to reconstruct the data you decode into a semantically meaningful form.

To put it another way, imagine the terms were "serialize" and "deserialize". Of course one serializes to and deserializes from binary data. Just replace "{,de}serialize" with "{en,de}code" and you're done.

link

lawpoop 3837 days ago

You mean the prefixes 'en-' and 'de-'?

link

odonnellryan 3837 days ago

Encode and Decode are slightly subjective... why not something like to_bytes and from_bytes? Maybe not the best names, but definitely clearer on the meaning.

link

takeda 3836 days ago

Not really.

Veedrac had a good analogy, think of text as something abstract, for example imagine text is an image or sound, if you want to store it in bytes you need to encode it, and to read back you decode it.

As to_bytes/from_bytes, actually python provides it too:

to_bytes -> bytes(<text>)

from_bytes -> str(<bytes>)

link

odonnellryan 3836 days ago

It makes sense for sure, just isn't super intuitive - if it was people wouldn't be so confused.

link

jes5199 3837 days ago

I think that's backwards

link

rspeer 3837 days ago

It's not backwards.

I think that reveals that the names really do have a problem. The problem is that "encode" sounds like "make this Unicode" to people who aren't familiar with Unicode.

link