|
|
|
|
|
by junrushao1994
1144 days ago
|
|
This is our latest project on making LLMs accessible to everyone. With this project, users no longer need to spend a fortune on huge VRAM, top-of-the-line GPUs, or powerful workstations to run LLMs at an acceptable speed. A consumer-grade GPU from years ago should suffice, or even a phone with enough memory. Our approach leverages TVM Unity, a machine learning compiler that supports compiling GPT/Llama models to a diverse set of targets, including Metal, Vulkan, CUDA, ROCm, and more. Particularly, we've found Vulkan great because it's readily supported by a wide range of GPUs, including AMD and Intel's. BTW, an interesting data point from Reddit that it also works on steam deck: https://www.reddit.com/r/LocalLLaMA/comments/132igcy/comment.... |
|