|
|
|
|
|
by Farmadupe
47 days ago
|
|
I wonder if for a model that small with a permissive license it might not be worth their time to host a commercial grade inference stack? Might be easier to chuck it over the fence and let other providers handle it as it'll run in almost any commercial grade card? Also speculating, but I wonder if it might also create a bit of a pricing problem relative to Gemini flashlight depending on serving cost and quality of outputs? As a comparison, despite being SotA for their size, the smallest qwen models on openrouter (27b and 35b) are not at all worth using, as there are way bigger and better models for less oricemon a per token basis |
|