Hacker News new | ask | show | jobs
by belval 1040 days ago
Nice project! I could not find the information in the README.md, can I run this with a GPU? If so what do I need to change? Seems like it's hardcoded to 0 in the run script: https://github.com/getumbrel/llama-gpt/blob/master/api/run.s...
3 comments

I put up a draft PR to demo how to run it on a GPU: https://github.com/getumbrel/llama-gpt/pull/11

It breaks other things like model downloading, but once I got it to a working state for myself, I figured why not put it up there in case its useful. If I have time, I'll try to rework it a little bit with more parameters and less dockerfile repetition to fit the main project better.

Ah yes, running on GPU isn't supported at the moment. But CUDA (for Nvidia GPUs) and Metal support is on the roadmap!
Ah fascinating, just curious, what's the technical blocker? I thought most of the Llama models were optimized to run on GPUs?
It's fairly straightforward to add GPU support when running on the host, but LlamaGPT runs inside a Docker container, and that's where it gets a bit challenging.
It shouldn't, nVidia provides a CUDA Docker plugin that lets you expose your GPU to the container, and it works quite well.
See above if you're interested in that. It does work quite well, even with nested virtualization (WSL2).
I am, thanks!
Had the same thought, since it is kinda slow (only have 4 pyhsical/8 logical cores though). But I think vRAM might be a problem (8gb can work, if one has a rather recent gpu (here m1/2 might be interesting)).