|
|
|
|
|
by WithinReason
214 days ago
|
|
How far is Tinygrad from being able to represent/search the kind of optimisations listed in the article? i.e.: 1. data layouts to avoid local memory bank conflicts
2. read patterns from global memory to optimize L2 cache reuse
3. warp specialisation
How complex is it to add these into tinygrad? |
|
tinygrad doesn't support 3 yet, it's not needed on any AMD GPUs, and not needed on NVIDIA consumer. It wouldn't be hard to add, but it's important to figure out how it best fits with the existing abstractions. I think everything will eventually move to a more producer-consumer model.