Hacker News new | ask | show | jobs
by wrigglingworm 967 days ago
>But even NorthPole’s 224 megabytes of RAM are not enough for large language models, such as those used by the chatbot ChatGPT, which take up several thousand megabytes of data even in their most stripped-down versions.

Is that right?

2 comments

At a glance, the supplementary materials from IBM's paper claim even less:

> NorthPole's core array includes 192 MB of flexible memory (768KB of unified memory per core). Assigning 2/3rd of this memory to parameters, such as weights and biases, provides 128MB for network storage.

https://www.science.org/doi/suppl/10.1126/science.adh1174/su...

My understanding is that this is a small energy-efficient chip for edge computing, like to stuff in an IoT device. Way too little memory to expect to run any recent language models, but could maybe do some basic object detection in a doorbell camera, for example.

The article's author seems to believe 224 megabytes is a huge amount of memory, and is a few orders of magnitude too low on the ChatGPT estimate too.

Yeah. 224 MB is nothing.

Small LLMs that have been quantized to optimize for space still sit at 4-9 gigs

I would expect LLM hardware to routinely support between 32 and 512GB memory in the Very Near Future. 1-4TB by the end of the decade. Custom hardware for GPT and LLM technology only started being developed in earnest in September 2022
I thought the article was mixing up MB and GB at first. Is that the size of the LLM with no training data? Is there any value in that?
Once the model is trained, it doesn't need to keep the training data around.

GPT-3 is about 175 billion parameters (though I have no idea how many bits per parameter OpenAI uses at inference-time), and is apparently trained on 45 TB of data[0]

[0] Caution: citation was first hit on google, YMMV — https://www.springboard.com/blog/data-science/machine-learni...

Presumably if you are training a robot to use a new/different tool you'll want the ability to train on site. If you buy an iHop restaurant the pancake robot in the kitchen ought to be able to be repurposed as a hamburger robot for your cheeseburger business. Omlette scrambling robots could be trained to mix small batches of cookie dough. Etc etc. Toyota is working on developing a framework for this already.
Some robot firms are integrating LLMs to make the robots more general: https://www.youtube.com/watch?v=Vq_DcZ_xc_E&t=2s

On-site training is… not really solved yet. Not efficiently, at any rate: any task can be trained with sufficient compute and/or examples, but probably more than most companies would care to bother with, and certainly more than we'd get onto one of the chips in the article.

That's not to diss the chips: As I understand it, one of the biggest issues is the power envelope of mobile units, which means making the computations more energy efficient is going to help massively, it's just that "training" and "inference" are currently very distinct tasks with very different hardware requirements.

(Also, I'm not sure if you mean those examples as illustrations or are serious about them: if you're serious, I suspect an old-fashioned robot arm bolted to the ground and following a pre-programmed path will probably cover your needs — GOFAI is great in restricted domains, the more modern AI models are more appropriate when the environment is more chaotic and less predictable, such as collaborating in a kitchen that also has humans or being asked on the fly to do a new recipe it's never encountered before).

These are the trained weights and biases, the training data is unknown in size but could be terabytes… I’ve no idea how to even guess at the size of the training data but that doesn’t all need to in ram at the same time.
That's not true, there are some very pruned (and relatively dumb) LLMs which go under a gig.
Small LLMs or large small language models???
What’s the point of a state of the art AI chip that can’t run large models? It seems problematic to say the least!
It is a new design, not a direct competitor for nVidia’s flagship. And there are lots more applications for AI than LLMs.
Realtime, low power, isn't really possible today. Think anything motion or reaction related.
Some of us still work on computer vision. 224MiB is fairly massive for convolutional neural network.
It can run small models very fast.