Hacker News new | ask | show | jobs
by CapsAdmin 438 days ago
Slightly related, I had a go at doing llama 3 inference in luajit using cuda as one compute backend for just doing matrix multiplication

https://github.com/CapsAdmin/luajit-llama3/blob/main/compute...

While obviously not complete, it was less than I thought was needed.

It was a bit annoying trying to figure out which version of the function (_v2 suffix) I have to use for which driver I was running.

Also sometimes a bit annoying is the stateful nature of the api. Very similar to opengl. Hard to debug at times as to why something refuse to compile.

1 comments

Neat, thanks for sharing!