Hacker News new | ask | show | jobs
by Animats 266 days ago
Actual result: "This new process promises to increase the number of optical fibers that can be connected at the edge of a chip, a measure known as beachfront density, by six times."

Faster interconnects are always nice, but this is more like routine improvement.

2 comments

"In recent inference tests run on a 3-billion-parameter LLM developed from IBM’s Granite-8B-Code-Base model, NorthPole was 47 times faster than the next most energy-efficient GPU and was 73 times more energy efficient than the next lowest latency GPU."

It's also fascinating that they are experimenting with analog memory because it pairs so well with model weights

Yeah, analog memory fits so incredibly well. Who cares if it's not "exact" and fuzzes around a bit if it's only used for weights and has massive efficiency advantages. Weights are never "exact" themselves, and it doesn't matter if they don't always read exactly the same. You basically just get some extra "temperature" for free!

A bit beautiful that we might end up partially going back to analog computers, which were quickly replaced by digital ones.

> A bit beautiful that we might end up partially going back to analog computers, which were quickly replaced by digital ones.

How long till we get a Ben Eater-style video about someone making a basic analog neural network using some DACs, analog multipliers[1] and bucket-brigade chips[2] for intermediate values?

[1]: https://www.analog.com/media/en/training-seminars/tutorials/...

[2]: https://en.wikipedia.org/wiki/Bucket-brigade_device

Their NorthPole chip doesn't look much different than the Groq LPU or Tenstorrent's hardware or even just AMD's NPU design. The tenstorrent cards have a pretty big amount of SRAM considering their price.
I am not an expert on this but reading Groq's description of their hardware it still has a compute/memory split. They make the memory super fast so it can fully feed the CPU without latency (80 terabytes second!). In the end is it much different than moving the ALU into memory like IBM is doing? The goal for both is to eliminate the memory bottleneck so there can be a variety of valid approaches.
How does Cerebras WSE-3 with 44GB of 'L2' on-chip SRAM compare to Google's TPUs, Tesla's TPUs, NorthPole, Groq LPU, Tenstorrent's, and AMD's NPU designs?
In-Memory compute has nothing to do with connecting optical fibers to a chip.