Y
Hacker News
new
|
ask
|
show
|
jobs
by
shirman
380 days ago
Hi, it does not work with llama.cpp right?
1 comments
codelion
380 days ago
Optillm works with llama.cpp but this approach is implemented as a decoding strategy in PyTorch so at the moment you will need to use the local inference server in optillm to use it.
link