|
|
|
|
|
by tauoverpi
430 days ago
|
|
Nice, I've found that using an iterator for this often generates quite a bit of extra code and prevents vectorization in general which is why I switched to an API using inversion of control for the `forEach` case (I don't have iterators). Working on one item at a time (which the iterator causes) resulted in quite a bit of overhead defeating partially the gains from having a more compact memory layout (SoA) with a more complex code path while preventing use of SIMD over multiple components at a time. Is this an issue observed in this implementation and what is the general design space that this implementation targets? How it works now is partially how mach had it at least a year or more ago if I remember correctly while they still had an ECS. |
|
I was concerned about this, so I provide the following two APIs:
* https://docs.gamesbymason.com/zcs/#zcs.Entities.chunkIterato...
* https://docs.gamesbymason.com/zcs/#zcs.Entities.forEachChunk
Both of these iterate over chunks instead of entities, and as a result the iterator only needs to advance once per chunk of contiguous entities. The caller is then responsible for calling chunk.view (https://docs.gamesbymason.com/zcs/#zcs.chunk.Chunk.view) to get slices of components from the current chunk, and from there they can either rely on autovectorization or implement the vectorization themselves since they're now working actual slices of data instead of an iterator.
I haven't actually checked if the optimizer is smart enough to work back from the higher level API and realize that it's just an abstracted for loop. I inline some stuff to encourage this, but my assumption is basically that this is an unrealistic thing to expect it to do hence the lower level API. Providing the per-chunk API also isn't really a maintenance burden or anything since the high level API is implemented in terms of it anyway.
There's an argument to be made that I should only provide the lower level API since it's likely more optimal, but IMO it's less friendly & switching to it in the rare instances where it turns out to be a bottleneck should be easy enough.
Is my chunk iterator API similar to where you ended up with, or did you take it a different direction? Feel free to link your project if you want, I'm always interested to see how other people handle this stuff!