| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonw 1194 days ago
	In this particular case that doesn't matter, because the only time you run Python is for a one-off conversion against the model files. That takes at most a minute to run, but once converted you'll never need to run it again. Actual llama.cpp model inference uses compiled C++ code with no Python involved at all.