Happy to announce this breakthrough, made largely possible by Nvidia's H200 SXMs and a proprietary speculative decoding algorithm.
We've launched a production grade API endpoint at $3 per million tokens. We also have some capacity for fine tuning 405B, while still keeping the speed increases, so if you're interested please get in touch.
We've launched a production grade API endpoint at $3 per million tokens. We also have some capacity for fine tuning 405B, while still keeping the speed increases, so if you're interested please get in touch.