|
|
|
|
|
by abtinf
20 days ago
|
|
Insofar as I can tell, inference is on a certain path toward becoming "free". The models are now extremely powerful on high-end consumer hardware, and the efficiency trend seems likely to continue. Here is a recent non-rigorous benchmark I ran against a bunch of models. Qwen3.6 35B A3B fine-tuned with opus data runs plenty fast on my local machine and produce outstanding results - easily in the top 5, comparable to GPT 5.5 Pro (which is $180/mtok). https://gistpreview.github.io/?31d66ef69e4aed3efae1aec69d86c... I've predicted for years now that the industry will head down the path of the virus scanning vendors: selling subscriptions to be able to download the latest versions of models. I simply don't see how any other business model is remotely viable, except at the very highest end of inference or video gen. |
|