| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by WithinReason 214 days ago

How far is Tinygrad from being able to represent/search the kind of optimisations listed in the article? i.e.:

  1. data layouts to avoid local memory bank conflicts
  2. read patterns from global memory to optimize L2 cache reuse
  3. warp specialisation

How complex is it to add these into tinygrad?

1 comments

georgehotz 214 days ago

1 and 2 are supported, 1 you need to specify, 2 will be found with BEAM. We are working on reimplementing HipKittens in tinygrad, all the stuff is there to do it. See the amd_uop_matmul example.

tinygrad doesn't support 3 yet, it's not needed on any AMD GPUs, and not needed on NVIDIA consumer. It wouldn't be hard to add, but it's important to figure out how it best fits with the existing abstractions. I think everything will eventually move to a more producer-consumer model.

link

0-_-0 213 days ago

Good luck with the AMD contract! I imagine HipKittens came at just the right time.

link