| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwaway81523 293 days ago
	I wonder if you thought about perfect hashing instead of that comparison tree. Also, flex (as in flex and bison) can generate what amounts to trees like that, I believe. I haven't benchmarked it compared to a really careful explicit tree though.

2 comments

netr0ute 293 days ago

I thought about hashing, but found that hashing would be enormously slow to compute compared to a perfectly crafted tree.

link

dafelst 293 days ago

But did you think about using a perfect hash function and table? Based on my prior research, it seems like they are almost universally faster on small strings than trees and tries due to lower cache miss rates.

link

dist1ll 293 days ago

Ditto. Perfect hashing strings smaller than 8 bytes has been the fastest lookup method in my experience.

link

netr0ute 293 days ago

Problem is, there are a lot of RISC-V instruction way longer than that (like th.vslide1down.vx) so hashing is going to be slow.

link

ashdnazg 293 days ago

You could copy the instruction to a 16 byte sized buffer and hash the one/two int64s. Looking at the code sample in the article, there wasn't a single instruction longer than 5 characters, and I suspect that in general instructions with short names are more common than those with long names.

This last fact might actually support the current model, as it grows linearly-ish in the size of the instruction, instead of being constant like hash.

link

snvzz 292 days ago

Note th.vslide1down.vx is a T-Head instruction, a vendor custom extension.

It is not part of RISC-V, nor supported by any CPUs outside of that vendors' own.

link

Lerc 292 days ago

Is there a handy list of all RISC-V instructions?

link

Sesse__ 293 days ago

You're probably thinking of gperf, not flex and bison.

link

sylware 293 days ago

Oh, I remember I did a plain and simple C port of an old gperf, cgperf https://www.rocketgit.com/user/sylware/cgperf

Ofc, I did add my own bugs.

link

throwaway81523 292 days ago

I meant flex, for generating a switch table for that type of lexer. gperf is for hashing which is different. But, there may be better methods by now since the field has changed a lot.

link