|
|
|
|
|
by embedding-shape
230 days ago
|
|
> how much of ML development these days touches these “lower level” parts of the stack? I’d expect that by now most of the work would be high level Every time the high level architectures of models change, there are new lower level optimizations to be done. Even recent releases like GPT-OSS adds new areas for improvements, like MXFP4, that requires the lower level parts to created and optimized. |
|
Is TOPS/Whr a good efficiency metric for TPUs and for LLM model hosting operations?
From https://news.ycombinator.com/item?id=45775181 re: current TPUs in 2025; "AI accelerators" :
> How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?