| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dloreto 1089 days ago

Thanks for the feedback!

We have tests for the base32 encoding which is the most complicated part of the implementation (https://github.com/jetpack-io/typeid-go/blob/main/base32/bas...) but your point stands. We'll add a more rigorous test suite (particularly as the number of implementations across different languages grows, and we want to make sure all the implementations are compatible with each other)

Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

3 comments

dolmen 1088 days ago

There is no tests.

There is just a single test. Which only tests the decoding of a single known value. No encoding test.

Go has infrastructure for benchmarking and fuzzing. Use it!

link

dloreto 1088 days ago

We've now implemented pretty thorough testing: https://github.com/jetpack-io/typeid-go/blob/main/typeid_tes...

Thanks for the feedback!

link

klabb3 1089 days ago

> We have tests for the base32 encoding which is the most complicated part of the implementation

I didn’t look into it much but it seems like a great encoding even outside of this project. Predictable length, reasonable density, “double clickable” etc. I’ve been annoyed with both hex and base64 for a while so it’s pretty cool just by itself.

> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

Yeah, the worry is almost entirely “subtle deviations across stacks”, which is usually due to ambiguous specs. It’s so annoying when there’s minor differences, compatibility options etc (like base64 which has another “URL-friendly” encoding - ugh).

link

aftbit 1089 days ago

My personal favorite encoding is base58 aka Bitcoin address encoding. It uses characters [A-Za-z0-9] except for [0OIl]. It is almost as dense as base64, "double clickable", but not (as) predictable in length as base32.

It was chosen to avoid a number of the most annoying ambiguous letter shapes for hand-entry of long address strings.

https://en.bitcoin.it/wiki/Base58Check_encoding

link

bredren 1089 days ago

Reminds me that Windows activation keys used to exclude a broader set of characters to avoid transcription errors: looking it up again: 0OI1 and 5AELNSUZ

link

unshavedyak 1088 days ago

Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

I was reviewing encodings recently and didn't want to drop all the way down to base32, but for some reason the library i was using didn't allow anything beyond base32 and bas64 variants, despite having a feature where you can define your own base.

I thought maybe it was performance oriented. An odd prefix length like base63 would mean .. i think, a slightly more computationally demanding set of encoding instructions?

Either way i basically want base58 but i don't care about legibility, i just wanted double click and url friendly characters.

link

fauigerzigerk 1088 days ago

>Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

Yes, the reason is that you need 64 characters if you want each character to encode 6 bits as log2(64) == 6. If you only have 63 characters in your alphabet then one of your 6-bit combinations has no character to represent it.

Base32 can represent 5 bits per character because log2(32) == 5. Anything in between 32 and 64 doesn't buy you anything because there is no integer between 5 and 6.

link

unshavedyak 1088 days ago

Is that "just" a performance concern though? Ie why is there a base58 and base62 but no base63?

Now you've got me curious on the performance of base58 to base64 hah. Down the rabbit hole i go. Appreciate your reply, thanks :)

link

TedDoesntTalk 1088 days ago

It’s not too difficult to write your own encoding. Probably 10 lines of code or less if you hard-code your encoding alphabet.

link

kbumsik 1088 days ago

What does “double clickable” mean?

link

d0mine 1087 days ago

Whether "double click" selects the whole id.

link

kbumsik 1089 days ago

> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

It would be great if you add suggestions for compound types (like “article-comment”) in README as OP stated as well.

link

spoiler 1088 days ago

It seems that's not allowed currently, if I'm reading it right. I'm not sure I like `-` very much. The reason why I don't like it is because of how double-click to select and line breaking works for the dash. Maybe allowing `_` in the typename, and the have the rightmost `_` serve as the separator might be more consistent.

But also, I'm bike-shedding and its only an ID

link

kbumsik 1088 days ago

I like using "." for this case. Because types definition typically belong to a package or module, which commonly uses "." for separator.

link