Hacker News new | ask | show | jobs
by junrushao1994 1038 days ago
yeah we tried out popular solutions like exllama and llama.cpp among others that support inference of 4bit quantized models