Hacker News new | ask | show | jobs
by simonw 1194 days ago
In this particular case that doesn't matter, because the only time you run Python is for a one-off conversion against the model files.

That takes at most a minute to run, but once converted you'll never need to run it again. Actual llama.cpp model inference uses compiled C++ code with no Python involved at all.