|
|
|
|
|
by matanyall
856 days ago
|
|
Groq Engineer here, I'm not seeing why being able to scale compute outside of a single card/node is somehow a problem. My preferred analogy is to a car factory: Yes, you could build a car with say only one or two drills, but a modern automated factory has hundreds of drills! With a single drill, you could probably build all sorts of cars, but a factory assembly line is only able to make specific cars in that configuration. Does that mean that factories are inefficient? You also say that H200's work reasonably well, and that's reasonable (but debatable) for synchronous, human interaction use cases. Show me a 30b+ parameter model doing RAG as part of a conversation with voice responses in less than a second, running on Nvidia. |
|
Or I got this completely wrong, and your solution enables use-cases that are simply unattainable on mainstream (Nvidia/AMD) hardware, making TCO argument less relevant?