You can get something pretty fast right now with a Cerebras Coder subscription, sadly I think the best model they had last I checked was the somewhat dated GLM 4.7: https://inference-docs.cerebras.ai/models/overview
I feel like if they got DeepSeek V4 Flash and Pro running on their hardware, even if at less than 1000 tok/s, they’d still be crushing it with any subscription they’d provide, given how generous their token limits were.
As for the demo it's fast and extremely dumb like expected for 2B. I asked how to stop drinking habit and in just one follow-up message it recommended trying 8% ABV. Hilarious.
I feel like if they got DeepSeek V4 Flash and Pro running on their hardware, even if at less than 1000 tok/s, they’d still be crushing it with any subscription they’d provide, given how generous their token limits were.