Hacker News new | ask | show | jobs
by anthonix1 745 days ago
Seems to be an issue on their side. E.g., for a step of GPT2 training on a 7900 XTX [1]: tinygrad is ~440ms, PyTorch 2.4.0.dev20240513 is ~97ms, Karpathy's llm.c with ROCm is ~79ms, and llm.c with custom kernels is ~58ms

[1] https://github.com/anthonix/llm.c [2] https://github.com/tinygrad/tinygrad/issues/4301

1 comments

That issue seems a month old, while the 58ms number looks 1 day old.

I have seen last month getting a lot of work done in improving performance (it's in the release announcement as well), but of course I still don't think it can compete with that number...still, a new comparision would be cool.

Ran tinygrad again about a week ago, no change.

And still no comment on the issue, will re-run if there is any comment.

Thanks for the answer