|
|
|
|
|
by atairov
1045 days ago
|
|
Regarding the original llama2.c as I believe the value proposition is to have simple implementation that can execute the inference locally on wide variety of platforms. What if we can execute fine-tuned Llama7B on our phones? |
|
7B and 13B are already quite performant with mlc-llm (which uses an Apache TVM Vulkan/Metal backend). Llama.cpp has the potential to perform well too.
These "single file" implementations are not meant to be optimized or feature rich, I dont think.