|
|
|
|
|
by laborcontract
914 days ago
|
|
This is really impressive. For reference, inference for llama 70b on together’s api generates text at roughly 60 tokens/second. I can’t find any information about an api, though I’m guessing that the costs are eye watering. If they offered a Mixtral endpoint that did 300-400 tokens per second at a reasonable cost, I can’t imagine ever using another provider. |
|