Hacker News new | ask | show | jobs
by junrushao1994 1144 days ago
This is our latest project on making LLMs accessible to everyone. With this project, users no longer need to spend a fortune on huge VRAM, top-of-the-line GPUs, or powerful workstations to run LLMs at an acceptable speed. A consumer-grade GPU from years ago should suffice, or even a phone with enough memory.

Our approach leverages TVM Unity, a machine learning compiler that supports compiling GPT/Llama models to a diverse set of targets, including Metal, Vulkan, CUDA, ROCm, and more. Particularly, we've found Vulkan great because it's readily supported by a wide range of GPUs, including AMD and Intel's.

BTW, an interesting data point from Reddit that it also works on steam deck: https://www.reddit.com/r/LocalLLaMA/comments/132igcy/comment....

1 comments

Not sure if you're interested in support questions, but I ran the simple start thing you guys put up (Linux, RX 570) -- and it runs quickly but spits out absolute gibberish?
Thanks for sharing! Sometimes LLMs do generate some weird stuff, but if the issue persists, please do report this to our github issues!