| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jjeaff 985 days ago
	and yet, Internet protocols (http, at least) don't play well with equal signs which are part of base64, sometimes. That little issue has caused lots of intermittent bugs for me over the years, either from forgetting to urlencode it or not urldecoding it at the right time.

6 comments

eastbound 985 days ago

So there are 7 base64 encodings, one with “+ / =“, one with “- _ =“, one with “+,” and no “=“… https://en.wikipedia.org/wiki/Base64#Variants_summary_table

link

rendaw 984 days ago

And decoders typically aren't interoperable, requiring you to use the specific decoder for that combination.

link

layer8 984 days ago

Which is silly, because there’s no good reason to, except for strict validation.

link

BiteCode_dev 985 days ago

TIL.

And Python uses RFC 4648

link

masklinn 984 days ago

Python might say that, but as often it’s not really true: it really mostly works off of 2045

- the “default” encoder (“b64encode”) will pad the output

- although it will not linebreak (“encodebytes”) does that)

- the default decoder will error if the input is not padded

- the default decoder will ignore all non-encoding characters by default

Also both b64encode and encodebytes actually use binascii.b2a_base64, which claims conformance to RFC 3548, which attempts to unify 1421 and 2045. Except RFC 3548 requires rejecting non-encoding data, whereas (again) Python accepts an ignores it by default, in 2045 fashion.

link

OskarS 985 days ago

And slashes as well, which is a magic character in both urls and file systems. Means you can't reliably use normal base64 for filenames, for instance. That might seem like a niche use-case, but it's really not, because you can use it for content-based addressing. Git does this, names all the blobs in the .git folder after their hash, but you can't encode the hash with regular base64.

link

layer8 984 days ago

There’s the URL- and filename-safe variant of Base64 [0]. Decoders can support it simultaneously and transparently.

[0] https://www.rfc-editor.org/rfc/rfc4648.html#section-5

link

cyanydeez 984 days ago

you can also manually replace the with urlsafe codes

link

JoshTriplett 985 days ago

Ditto the obnoxious "quoted-printable" mail encoding, which turns every = into =3D.

Still more robust than uuencode though.

link

bobbylarrybobby 985 days ago

It's basically the same as URL encoding, they just picked = instead of %

link

myfonj 984 days ago

It is, plus extra segmenting with `=` escaped line breaks [1]:

> Lines of Quoted-Printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an =

IIUC in Base64 you can throw whichever white space anywhere and it should be ignored. And in URL ("percent") encoding there is no insignificant white space possible (?) and encoding of white space depends on implementation (dreaded space `%20` vs ` ` vs `+` in application/x-www-form-urlencoded [2]).

[1] https://en.wikipedia.org/wiki/Quoted-printable [2] https://en.wikipedia.org/wiki/Percent-encoding

link

onetimeuse92304 984 days ago

I am using base62 for data that can be included in URIs.

link

afiori 984 days ago

all three symbols are some of the worst possible choices for compatibility with urls and many other things

.-_ would have been a better choice tha +/=

link

caf 984 days ago

base64 is older than URLs, though.

link

Vt71fcAqt7 985 days ago

And now we can have whitespace in url queries but we are still using %20 everywhere because "that's standard"...

link

CydeWeys 985 days ago

Try copy-pasting a link that has actual whitespace in its URL queries and see if it gets linkified correctly. Just because you can doesn't mean you should! A space is like the one delimiter that is applicable for separating out URLs from the context of a larger blob of text.

link

thaumasiotes 985 days ago

Browsers will often display %20 as a space, but that's not the same thing as spaces being legal within URLs.

link

Vt71fcAqt7 985 days ago

You are right. Seems firefox displays %20 as whitespace and converts whitespace to %20 when you use it. Chrome displays it as %20 but still converts whitespace to %20 if you try to use it.

link

nayuki 984 days ago

Space is not legal at the HTTP request level, because the opening line uses space as a delimiter like:

    GET /your/path-to/the.file HTTP/1.1

link

layer8 984 days ago

Have fun with newline and spaces/tabs conversions when allowing whitespace in URLs.

link