|
|
|
|
|
by junrushao1994
1140 days ago
|
|
Thanks for the feedback! This is definitely something we need to do. To share some data, currently the default model is Vicuna-7b, aggressively quantized to 2.9G. We are expanding the coverage to more models, particularly, Dolly and StableLM are just around the corner, needing some clean up work. As a fresh new project, right now we are starting to collect data points of which GPU models are supported well and fixing issues being reported. Please don't hesitate to report in our github issue! |
|
In any case I am happy to see these projects taking form. Perhaps one can eventually make the level of quantization dynamic based on the available vram etc :)
I will definitively play around with it (on linux though, not a phone!)