|
|
|
|
|
by appeldorian
1172 days ago
|
|
I think a big mistake in the article, in a context where performance is the main objective, is that the author uses an array of structs (AoS), rather than a struct of arrays (SoA). An SoA makes it so that the data is ordered contiguously, which is easy to read for the CPU, while an AoS structure interleaves different data (namely the x and y in this case), which is very annoying for the CPU. A CPU likes to read chunks of data (for example 128 bits of data/read) and to process these with SIMD instructions, executing a multiple of calculations with one CPU cycle. This is completely broken when using an array of structs. He uses the same data structure in both the Python and Rust code, so I imagine that he can get an extra 4x speedup at least if he rewrites his code with memory layout in mind. |
|