| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by evmar 958 days ago

In n2[1] I needed a fast tokenizer and had the same "garbage factory" problem, which is basically that there's a set of constant tokens (like json.Delim in this post) and then strings which cause allocations.

I came up with what I think is a kind of neat solution, which is that the tokenizer is generic over some T and takes a function from byteslice to T and uses T in place of the strings. This way, when the caller has some more efficient representation available (like one that allocates less) it can provide one, but I can still unit test the tokenizer with the identity function for convenience.

In a sense this is like fusing the tokenizer with the parser at build time, but the generic allows layering the tokenizer such that it doesn't know about the parser's representation.

[1] https://github.com/evmar/n2