|
|
|
|
|
by thewataccount
1038 days ago
|
|
I was impressed enough by replit's 2.7B model that I'm convinced it's doable. I have a 4090 and consider that the "max expected card for a consumer to own". Also exllama doesn't support non-llama models and the creator doesn't seem interested in adding support for wizardcoder/etc. Because of this, using the alternatives are prohibitively slow to use a quantized 16B model on a 4090 (if the exllama author reads this _please_ add support for other model types!). 3B models with refact are pretty snappy with Refact, about as fast as github copilot. The other benefit is more context space, which will be a limiting factor for 16B models. tl;dr - I think we need ~3B models if we want any chance of consumer hardware to reasonably run coding models akin to github copilot with decent context length. And I think it's doable. |
|