Hacker News new | ask | show | jobs
by indrora 98 days ago
Ostensibly, a mix of VC funding and that they host an endpoint that lets them run the big (200+GB) models on their infrastructure rather than having to build machines with hundreds of gigs of llm-dedicated memory.
1 comments

But on inference they have to compete with other inference provider that just has a homepage, a bunch of GPUs running vllm and none of the training cost. Their only real advantage are the performance optimizations that they might have implemented in their inference clusters and not made public
Qwen, at least, IIRC has some proprietary models, specifically the Max series. IIRC these have larger context windows.