| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by junrushao1994 1187 days ago

Thanks for the feedback! This is definitely something we need to do. To share some data, currently the default model is Vicuna-7b, aggressively quantized to 2.9G.

We are expanding the coverage to more models, particularly, Dolly and StableLM are just around the corner, needing some clean up work.

As a fresh new project, right now we are starting to collect data points of which GPU models are supported well and fixing issues being reported. Please don't hesitate to report in our github issue!

2 comments

tyfon 1187 days ago

I see, the 2.9 GB requirements seems to imply a 3 bit weights?

In any case I am happy to see these projects taking form. Perhaps one can eventually make the level of quantization dynamic based on the available vram etc :)

I will definitively play around with it (on linux though, not a phone!)

link

int_19h 1187 days ago

When people tried 3-bit quantization for 7B models before, it did not exactly go well in terms of detrimental side effects. Are you using some new quantization techniques that mitigate that?

link