Hacker News new | ask | show | jobs
by moffkalast 836 days ago
> The 4.5-mm-square chip, developed using Korean tech giant Samsung Electronics Co.'s 28 nanometer process, has 625 times less power consumption compared with global AI chip giant Nvidia's A-100 GPU, which requires 250 watts of power to process LLMs, the ministry explained.

>processes GPT-2 with an ultra-low power consumption of 400 milliwatts and a high speed of 0.4 seconds

Not sure what's the point on comparing the two, an A100 will get you a lot more speed than 2.5 tokens/sec. GPT 2 is just a 1.5B param model, a Pi 4 would get you more tokens per second with just CPU inference.

Still, I'm sure there's improvements to be made and the direction is fantastic to see, especially after Coral TPUs have proven completely useless for LLM and whisper acceleration. Hopefully it ends up as something vaguely affordable.

1 comments

Which of the model requirements of Coral TPUs [1] are the most problematic for LLMs?

[1] https://coral.ai/docs/edgetpu/models-intro/#model-requiremen...

Guessing as to what the GP meant--coral TPUs max out around 8M parameters, IIRC. That's a few orders of magnitude less than the smallest LLM model.
The part where they have like 3 bytes of memory so you switch from extremely high latency of RAM to laughably sluggish latency of USB serial. I think there's also no support below 8 bit quants, which you'd really need.