Hacker News new | ask | show | jobs
by rspeer 3799 days ago
> Afaik newer versions of messagepack added an extra type to have string and binary now seperated.

The problem is that the 'str' type contains arbitrary binary data in an unspecified encoding, and always will, because of backward compatibility. This isn't changed by adding a 'bin' type.

Msgpack decoders in Python, for example, have to give you bytestrings unless you pass an option that promises that 'str's are all encoded in UTF-8.

2 comments

From https://github.com/msgpack/msgpack/blob/master/spec.md

  Raw
    String extending Raw type represents a UTF-8 string
    Binary extending Raw type represents a byte array
Ah okay, I didn't know there was now a specific String type (and that the one I was calling 'str' is called 'raw'). Does the Python library use it?
I don't even know what to believe anymore. That documentation is referring to two types, with "raw" renamed to "str" plus a new "bin", which is what I thought it was.

But the link you posted referred to three types, where "str" and "bin" subclass "raw", which sounded like it provided a non-backward-compatible "str" that's guaranteed to be text.

They should just add a UTF8 type... I don't know why that wasn't the default for strings all along.