Hacker News new | ask | show | jobs
by avianion 597 days ago
Happy to announce this breakthrough, made largely possible by Nvidia's H200 SXMs and a proprietary speculative decoding algorithm.

We've launched a production grade API endpoint at $3 per million tokens. We also have some capacity for fine tuning 405B, while still keeping the speed increases, so if you're interested please get in touch.