Hacker News new | ask | show | jobs
by shirman 380 days ago
Hi, it does not work with llama.cpp right?
1 comments

Optillm works with llama.cpp but this approach is implemented as a decoding strategy in PyTorch so at the moment you will need to use the local inference server in optillm to use it.