| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anonzzzies 168 days ago
	We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else.

5 comments

Aurornis 167 days ago

The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck.

There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth.

link

touristtam 167 days ago

It is also to note that the bandwidth bus has seen very little upgrade over the years and even the onboard RAM on GPU card have seen mediocre upgrades. If everyone and their grandma wasn't using NVidia GPUs we would probably have seen a more competitive market and greater changes outside the chip itself.

link

bigyabai 167 days ago

I don't think that's true. AMD, Apple and Intel are all dGPU competitors with roughly the same struggle bringing upgrades to market. They have every incentive to release a disruptive product, but refuse to invest in their ecosystem the way Nvidia did.

link

chvid 168 days ago

Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU.

https://boilingsteam.com/orange-pi-6-plus-review/

link

sofixa 167 days ago

Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS.

It's just that practically nothing uses those NPUs.

link

baq 167 days ago

At this point of the timeline compute is cheap, it’s RAM which is basically unavailable.

link

fouc 168 days ago

I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips.

link

bigyabai 167 days ago

It's quite easy to understand. The tech industry has gone through 4-5 generations of obsolete NPU hardware that was dead-on-arrival. Meanwhile, there are still GPUs from 2014-2016 that run CUDA and are more power efficient than the NPUs.

The industry has to copy CUDA, or give up and focus on raster. ASIC solutions are a snipe chase, not to mention small and slow.

link