| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eyegor 749 days ago

What do you mean by "relatively universal"? This is Cuda only [0] with a promise of a rocm backend eventually. There's only one project I'm aware of that seriously tries to address the Cuda issue in ml [1].

[0] https://github.com/HazyResearch/ThunderKittens?tab=readme-ov...

[1] https://github.com/vosen/ZLUDA

1 comments

f_devd 748 days ago

If you read the article I linked they show that it's entirely based on 16x16 matrices (or "tiles") which is fairly standard across gpus.

link