|
|
|
|
|
by geuis
836 days ago
|
|
Want to reference Groq.com. They are developing their own inference hardware called an LPU https://wow.groq.com/lpu-inference-engine/ They also released their API a week or 2 ago. Its significantly faster than anything from OpenAI right now. Mixtral 8x7b operates at around 500 tokens per second. https://groq.com/ |
|