|
|
|
|
|
by azeirah
829 days ago
|
|
From how I understood it, it means they optimised the entire stack from CUDA to the networking interconnects specifically for data centers, meaning you get 30x more inference per dollar for a datacenter. This is probably not fluff, but it's only relevant for a very very specific use-case, ie enterprises with the money to buy a stack to serve thousands of users with LLMs. It doesn't matter for anyone who's not microsoft, aws or openai or similar. |
|