Hacker News new | ask | show | jobs
by nylonstrung 98 days ago
Unless I misunderstood it seems like this is trailing the pareto frontier in cost and speed.

Compare to providers like Fireworks and even with the openrouter 5% charge it's not competitive

2 comments

our SLA is actually higher and we are lower priced. We are also using this as a step into serving finetuned models for much cheaper than Fireworks/Together and not having the horrible cold starts of Modal. We're essentially trying to prove that our engine can hang with the best providers while multiplexing models.
According to the providers that I keep track of, Cumulus is typically pretty price competitive, except for MiniMax where DeepInfra and Together are much cheaper and GLM-5 where DeepInfra and z.AI's own hosting is much cheaper.

(Also technically qwen3 8b w/ novita being first place but barely)