| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rcgorton 1333 days ago

Re: register windows. I disagree: code size wasn't the killer here, it was how DEEP the stack got. If your architectural register window spilled at 4 deep, then calls 3 deep were fine, but if you had a set of code attempting to iterate over a tight loop which had 8 calls deep, you were in [performance] trouble.

Another divot: asymmetric functional units. Some versions of Alpha supported a PopCount instruction, but it only worked in a single functional unit, which made scheduling a pain, esp. if you had to write in assembly language.

I'm not convinced that AVX 256 and AVX 512 are useful for non-matrix operations. Most strings (more importantly, parsing bounded by whitespace) are much shorter than 512 bits (32 bytes). In English, I cannot come up with many words longer than 16 bytes (some place names, antidisestablishmentarianism, chemical compound names, and some other stuff)

1 comments

loup-vaillant 1333 days ago

> I'm not convinced that AVX 256 and AVX 512 are useful for non-matrix operations.

I've observed that compared to regular x86-64 code without SIMD, using AVX 256 speeds up the Chacha20 cipher (for long messages so they can be processed in 512-bytes chuncks (8 blocks)) by a factor of 5. Network packets easily exceed 1KB, and files are usually much bigger.

Matrix operations aren't the only viable niche.

link

sitkack 1333 days ago

SIMD has many non-matrix uses.

https://simdjson.org/

link