|
|
|
|
|
by vimarsh6739
855 days ago
|
|
Might be a bit out of context, but isn't the TPU also optimized for low latency inference? (Judging by reading the original TPU architecture paper here - https://arxiv.org/abs/1704.04760). If so, does Groq actually provide hardware support for LLM inference? |
|
Could you clarify your question about hardware support? Currently we build out our hardware to support our cloud offering, and we sell systems to enterprise customers.