| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brucethemoose2 1045 days ago

> What if we can execute fine-tuned Llama7B on our phones?

7B and 13B are already quite performant with mlc-llm (which uses an Apache TVM Vulkan/Metal backend). Llama.cpp has the potential to perform well too.

These "single file" implementations are not meant to be optimized or feature rich, I dont think.