|
|
|
|
|
by CapsAdmin
438 days ago
|
|
Slightly related, I had a go at doing llama 3 inference in luajit using cuda as one compute backend for just doing matrix multiplication https://github.com/CapsAdmin/luajit-llama3/blob/main/compute... While obviously not complete, it was less than I thought was needed. It was a bit annoying trying to figure out which version of the function (_v2 suffix) I have to use for which driver I was running. Also sometimes a bit annoying is the stateful nature of the api. Very similar to opengl. Hard to debug at times as to why something refuse to compile. |
|