|
|
|
|
|
by ibeckermayer
106 days ago
|
|
Cool that it's possible but basically unusable performance characteristics. For an 8192 token prompt they report a ~1.5 minute time-to-first-token and then 8.30tk/s from there. For context ChatGPT is typically <<1s ttft and ~50tk/s. |
|
If I'm right about that then if you're willing to go in for somewhere in the vicinity of $30k (24x the Max 385 model) you should be able to achieve ChatGPT performance.