| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sillysaurusx 1215 days ago

> On a single multi-GPUs server, even with the highest-end A100 80GB GPU, PyTorch can only launch ChatGPT based on small models like GPT-L (774M), due to the complexity and memory fragmentation of ChatGPT. Hence, multi-GPUs parallel scaling to 4 or 8 GPUs with PyTorch's DistributedDataParallel (DDP) results in limited performance gains.

Where are these numbers coming from? An 80GB A100 GPU is certainly more than capable of hosting a 1.5B GPT. We were running 774M on rinky-dink cards back in 2019 for our inference purposes.

I don’t understand how they went from talking about 175B params across 32 cards to 774M on one card. 175B divided by 32 is 5.4B.

In fact, I’m not sure what they’re saying in general. They seem to be confusing data parallelism with model parallelism with memory fragmentation, while namedropping a bunch of training techniques.

The hard part of ChatGPT isn’t the size. It’s the training process. It took a small army of contractors rating outputs as good or bad. Once that dataset gets replicated, we can start talking about size. Hopefully LAION will deliver.

4 comments

rnosov 1215 days ago

I think they are correctly referring to ChatGPT as GPT-3 + RLHF. In other words ChatGPT = GPT-3 + RLHF. So, 80GB A100 GPU would be required for both GPT-L AND RLHF (PyTorch version). And it looks to me from the TFA that the main thing that takes a lot of space is actually RLHF.

>I don’t understand how they went from talking about 175B params across 32 cards to 774M on one card. 175B divided by 32 is 5.4B.

They claim 774M is the size of GPT-L which if run in conjunction with their RLHF would require 80GB A100 GPU to train (using their RLHF PyTorch implementation). They additionally claim that training GPT-3(175B params) plus RLHF would need 64 * 80gb = 5120gb of memory if using PyTorch implementation of RLHF or 32 * 80gb = 2560gb if going Colossal AI route.

To be honest, these numbers do look to me to be a bit of a cheesy ad for their product but hey they need to put food on their table too. I'm not sure if the dataset would be such a huge problem otherwise Britannica would still be ahead of Wikipedia. Given an army of volunteers willing to produce it OpenAI brigade of contractors has no chance.

hybridity 1215 days ago

If someone created a folding@home to crowd train an actually open ChatGPT, I'd gladly donate my spare resources to the cause.

flangola7 1215 days ago

That's unlikely to work. The memory has to be fast with low latency, even switching from on-board VRAM to system RAM slows performance at least 10-100x. The bottleneck isn't computing power it's I/O. Total bus bandwidth of a common small AI cluster is around 1 terabyte per second.

We really shouldn't be building an "open source" AI in the first place though, and it's going to be illegal to do so soon. The weaponization power will be made clear soon and that will justifiably spook everyone.

dodslaser 1215 days ago

There's a significant number of people working hard on making certain tech illegal or at least heavily restricted. E2EE and Onion Routing comes to mind. That doesn't mean we should abandon them. In fact, in many cases it's an indicator that we should keep going.

Why do you think we should avoid an open source AI?

flangola7 1214 days ago

How do you plan to have differential technological development and careful alignment research if anyone is allowed to build Skynet in their garage?

I use and generally support E2EE and onion routing. E2EE and onion routing aren't inherently existential risks to the continued existence of life on Earth.

danaris 1214 days ago

Please stop with the flagrant "AI" fearmongering over LLMs and other current-generation ML software. Not only are they not Skynet now, I do not believe it will be possible for simple iteration on this type of ML software to create anything remotely like Skynet.

dodslaser 1214 days ago

LLMs are not going to pose an existential risk to anyone. Also, making AI development less accessible to the general public will not make it any safer.

I am willing to bet all this fear mongering singularity bullshit is just being peddled by large corporations with a vested interest to keep AI development out of reach from the general public.

hansvm 1214 days ago

Biohacking and minor isotope enrichment projects are par for the course in garages nowadays. Three-letter agencies don't care about me, so why should they care about ML 101 skynet adventures?

animuchan 1215 days ago

For this reason alone (corpos making AI illegal to maintain for mere mortals) we should strive to make as much progress in the truly open AI as possible.

The current dystopia is fairly dystopian as it is.

throw009 1214 days ago

>We really shouldn't be building an "open source" AI in the first place though, and it's going to be illegal to do so soon. The weaponization power will be made clear soon and that will justifiably spook everyone.

Encryption was illegal not that long ago for the same reasons. Now it's the basis of all the digital economy. If we made it illegal again of the top 10 tech companies by market cap only Nvidia and TSMC would not be outright illegal to operate.

The timid cowardice that's taken over tech will not serve it well in the coming 20 years.

flangola7 1214 days ago

How do you plan to have differential technology development and thoughtful and cautious alignment research if we go building these things without a speed limit?

Giving a baby a hand grenade would be more responsible.

dragonwriter 1214 days ago

> How do you plan to have differential technology development and thoughtful and cautious alignment research if we go building these things without a speed limit?

We aren’t going to have those things anyway; the closest we’ll get is if development is relatively public and open and thus subject to outsider critique. The only interest the closed corporate restricted approach has in alignment is in controlling the research, suppressing unwelcome avenues of inquiry, and generating PR to assuage public fears.

throw009 1214 days ago

Caution is for losers.

ahtihn 1215 days ago

> We really shouldn't be building an "open source" AI in the first place though, and it's going to be illegal to do so soon.

How do you make that illegal while still allowing private corporations to build AI? How do you legally define AI without applying it to all kinds of existing applications and without stopping all research on AI? And while staying broad enough that simply using a slightly different technique would still qualify under that definition?

flangola7 1214 days ago

Replace "AI" with "uranium enrichment and nuclear research" and the answers fill themselves in.

dodslaser 1214 days ago

Yes, and if you replace "uranium enrichment" with "teddy bears" it's a bedtime story for kids. That argument makes no sense.

sdenton4 1215 days ago

Yeah.... Having spent a lot of cycles replicating ML work, it's much more difficult than taking a stab at replicating a paper. It's typically doable (results really do replicate) but it can take a few good brains a year to pull it off. There's typically a lot of small decisions that add up, and a lot of hyperparameter sweeps to land in a good region of the optimization space.

popinman322 1215 days ago

> Once that dataset gets replicated, we can start talking about size. Hopefully LAION will deliver.

Is LAION starting a community project to rate model outputs? I didn't see anything on their site.

sitic 1215 days ago

Here it is: https://open-assistant.io (https://projects.laion.ai/Open-Assistant/)

Taek 1214 days ago

For reference, GPT-NeoX is a 20B parameter model, and it runs on 45 GB of VRAM. On an 80 GB A100 you could probably run a 35B parameter model. Maybe 8 A100 cards to do inference on ChatGPT?

Or 32 3090 cards, which would run you under $40k total.

fswd 1213 days ago

20B GPT-NeoX runs on a 3090 in 8 bit mode