|
|
|
|
|
by freeqaz
806 days ago
|
|
What's the easiest way to run this assuming that you have the weights and the hardware? Even if it's offloading half of the model to RAM, what tool do you use to load this? Ollama? Llama.cpp? Or just import it with some Python library? Also, what's the best way to benchmark a model to compare it with others? Are there any tools to use off-the-shelf to do that? |
|
You would have to confirm with someone deeper in the ecosystem, but I think you should be able to run this new model as is against a llamafile?
[0] https://github.com/Mozilla-Ocho/llamafile