Hacker News new | ask | show | jobs
by miscmask 1876 days ago
> and separately the y,z,w triples are contiguous ([y1,z1,w1,y2,z2,w2,y3,z3,w3...]).

Wouldn't having ys, zs and ws occupying different cache lines be good enough? After all, the CPU only wants fast access to these data. Or maybe it's a hard thing to do for CPUs to fetch from 4 different lines at a time (+1 for the instructions)?