Y
Hacker News
new
|
ask
|
show
|
jobs
by
junrushao1994
1038 days ago
yeah we tried out popular solutions like exllama and llama.cpp among others that support inference of 4bit quantized models