Hacker News new | ask | show | jobs
by brucethemoose2 1118 days ago
Very interesting.

Is 8bit/4bit support in the works? Will it work with bitsandbytes out of the box? Speedy inference is great, but in practice many users are running the biggest ~4-bit LLM that will fit into their RAM/VRAM pool these days. This is why llama.cpp is so good, its (AFAIK) the only implementation that will split a 4 bit quantized model so easily.

1 comments

Yes. We support >= 1bit <= 16bit models out of box for various of models.