| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by benwills 394 days ago
	I think they may be asking about the CPU cache.

1 comments

Dylan16807 394 days ago

You have to go out of your way to make a hash that doesn't fit into L1, so again they're all basically the same.

You'll probably end up fitting entirely inside the reorder buffer plus a sequential stream from memory, with the actual caches almost irrelevant.

link

benwills 394 days ago

Sure. Any worthwhile hash function will fit in the instruction cache. But there are ways to make more or less efficient use of the data cache.

link

Dylan16807 394 days ago

> Any worthwhile hash function will fit in the instruction cache.

Yes, I'm talking about the data cache.

> But there are ways to make more or less efficient use of the data cache.

How?

You need to touch every byte of input, and nothing should be faster than going right through from start to end.

link

benwills 394 days ago

I don't know you're experience with hash functions, so you may already know what I'm about to say.

This is a minor example, but since you asked...

https://github.com/Cyan4973/xxHash/blob/dev/xxhash.h#L6432

That's an example of a fair number of accumulators that are stored as XXHash goes through its input buffer.

Many modern hash functions store more state/accumulators than they used to. Previous generations of hash functions would often just have one or two accumulators and run through the data. Many modern hash functions might even store multiple wider SIMD variables for better mixing.

And if you're storing enough state that it doesn't fit in your registers, the CPU will put it into the data cache.

link

Dylan16807 394 days ago

> And if you're storing enough state that it doesn't fit in your registers, the CPU will put it into the data cache.

And there's 150+ registers in the actual chip.

But my argument is more that there isn't really an efficient or inefficient way to use L1. So unless you have an enormous amount of state, the question is moot. And if you have so much state you're spilling to L2, that's not when you worry about good or bad cache use, that's a weird bloat problem.

link

Sesse__ 393 days ago

_Fitting_ in the instruction cache isn't hard, but you'd also ideally let there be room for some other code as well :-) For a hash map lookup, where the hashing is frequently inlined a couple of times, code size matters.

link