| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by moth-fuzz 550 days ago
	The idea that arrays of structs are inherently more cache friendly and thus data-oriented-er is a bit reductive of the whole practice of data-oriented code. The point is to optimize data layout for access patterns. Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk. And if so, why is it even in a struct in the first place? If you use all fields of a struct in your algorithm, then an array of structs is the optimal way. All the same is true for enums.

5 comments

10000truths 550 days ago

Access patterns matter, but just as important is to have less stuff to access. That's why arrays-of-structs are considered cache friendly - columnar data layouts open the door to optimizations that significantly reduce memory footprint. You no longer waste memory with struct padding. Boolean fields can become bitsets. Enums can be bit-packed. Often-null optional fields can become sparse maps. 8-byte pointers can become narrower-sized indices into object pools.

link

pavlov 550 days ago

> “That's why arrays-of-structs are considered cache friendly”

Sounds like you mean structs-of-arrays?

link

10000truths 550 days ago

Oops, brainfart on my part. Unfortunately, the edit window has passed.

link

pierrec 549 days ago

"Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk" ... "If you use all fields of a struct in your algorithm, then an array of structs is the optimal way."

This is wrong! Cache optimization isn't the only factor here. Even given an algorithm that seemingly handles each object one-by-one and uses all fields, SIMD turns individual operations into a hidden bulk access, and giving each field its own array will speed things up. This is counter-intuitive at first but becomes obvious if you write SIMD by hand (the article mentions this but doesn't make it super clear IMO)

link

Joel_Mckay 550 days ago

Indeed, a struct can also be cooked to pack down with no padding, and or be dynamically redefined with a union.

Performance issues start to crop up with naive pre-fetching, and thus 100% guaranteed cache misses if the arrays are larger than L2.

This is why LLM AI generated slop degrades blogs into slop delivery services. =3

link

HumanOstrich 548 days ago

> This is why LLM AI generated slop degrades blogs into slop delivery services. =3

Not sure what LLMs and AI have to do with any of this.

link

Joel_Mckay 548 days ago

That is the primary problem domain, as there are a lot of folks that see well-structured nonsense as meaningful. =3

link

HumanOstrich 548 days ago

Why do you keep putting a penis "=3" at the end of your messages?

link

aragilar 550 days ago

Same with row-major vs. column major, accessing contiguous data is faster than non-contiguous data, so you should align your algorithms and data structures.

link

motorest 549 days ago

> The point is to optimize data layout for access patterns.

Yes. That's the point.

> Putting fields of a struct into their own arrays is only actually an optimization if you're only accessing that field in-bulk.

Yes, that's the scenario.

> And if so, why is it even in a struct in the first place?

Because that's how everyone is taught to model domains.

> If you use all fields of a struct in your algorithm, then an array of structs is the optimal way.

No. Your personal belief goes against both theoretical and empirical evidence. Others already talked about cache, padding, vectorized instructions, etc. I recommend you do a quick googling on the topic.

link