Hacker News new | ask | show | jobs
by walterbell 404 days ago
> While Qwen3 and DeepSeek are impressive, the infrastructure costs for running these at scale remain prohibitive for most use cases. The economics still don't work

  dedicated LLM hosting providers like Cerebras and Groq who can actually make money on each user inference query
Cerebras (wafer-scale) and Groq (TPU+) both have inference-optimized custom hardware.