Hacker News new | ask | show | jobs
by dloreto 1089 days ago
Thanks for the feedback!

We have tests for the base32 encoding which is the most complicated part of the implementation (https://github.com/jetpack-io/typeid-go/blob/main/base32/bas...) but your point stands. We'll add a more rigorous test suite (particularly as the number of implementations across different languages grows, and we want to make sure all the implementations are compatible with each other)

Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

3 comments

There is no tests.

There is just a single test. Which only tests the decoding of a single known value. No encoding test.

Go has infrastructure for benchmarking and fuzzing. Use it!

Also, you took code from https://github.com/oklog/ulid/blob/main/ulid.go which has "Copyright 2016 The Oklog Authors" but this is not mentionned in your base32.go.

We've now implemented pretty thorough testing: https://github.com/jetpack-io/typeid-go/blob/main/typeid_tes...

Thanks for the feedback!

> We have tests for the base32 encoding which is the most complicated part of the implementation

I didn’t look into it much but it seems like a great encoding even outside of this project. Predictable length, reasonable density, “double clickable” etc. I’ve been annoyed with both hex and base64 for a while so it’s pretty cool just by itself.

> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

Yeah, the worry is almost entirely “subtle deviations across stacks”, which is usually due to ambiguous specs. It’s so annoying when there’s minor differences, compatibility options etc (like base64 which has another “URL-friendly” encoding - ugh).

My personal favorite encoding is base58 aka Bitcoin address encoding. It uses characters [A-Za-z0-9] except for [0OIl]. It is almost as dense as base64, "double clickable", but not (as) predictable in length as base32.

It was chosen to avoid a number of the most annoying ambiguous letter shapes for hand-entry of long address strings.

https://en.bitcoin.it/wiki/Base58Check_encoding

Reminds me that Windows activation keys used to exclude a broader set of characters to avoid transcription errors: looking it up again: 0OI1 and 5AELNSUZ
Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

I was reviewing encodings recently and didn't want to drop all the way down to base32, but for some reason the library i was using didn't allow anything beyond base32 and bas64 variants, despite having a feature where you can define your own base.

I thought maybe it was performance oriented. An odd prefix length like base63 would mean .. i think, a slightly more computationally demanding set of encoding instructions?

Either way i basically want base58 but i don't care about legibility, i just wanted double click and url friendly characters.

>Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

Yes, the reason is that you need 64 characters if you want each character to encode 6 bits as log2(64) == 6. If you only have 63 characters in your alphabet then one of your 6-bit combinations has no character to represent it.

Base32 can represent 5 bits per character because log2(32) == 5. Anything in between 32 and 64 doesn't buy you anything because there is no integer between 5 and 6.

Is that "just" a performance concern though? Ie why is there a base58 and base62 but no base63?

Now you've got me curious on the performance of base58 to base64 hah. Down the rabbit hole i go. Appreciate your reply, thanks :)

It’s not too difficult to write your own encoding. Probably 10 lines of code or less if you hard-code your encoding alphabet.
What does “double clickable” mean?
Whether "double click" selects the whole id.
> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

It would be great if you add suggestions for compound types (like “article-comment”) in README as OP stated as well.

It seems that's not allowed currently, if I'm reading it right. I'm not sure I like `-` very much. The reason why I don't like it is because of how double-click to select and line breaking works for the dash. Maybe allowing `_` in the typename, and the have the rightmost `_` serve as the separator might be more consistent.

But also, I'm bike-shedding and its only an ID

I like using "." for this case. Because types definition typically belong to a package or module, which commonly uses "." for separator.