| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shirman 380 days ago
	Hi, it does not work with llama.cpp right?

1 comments

codelion 380 days ago

Optillm works with llama.cpp but this approach is implemented as a decoding strategy in PyTorch so at the moment you will need to use the local inference server in optillm to use it.

link