Hacker News new | ask | show | jobs
by vineyardmike 1253 days ago
All these answers are good, but I can share more concrete numbers…

Meta released their OPT model which they claim is comparable to the GPT-3 model. Guidance for running that model [1] suggests a LOT of memory - at least 350GB of gpu memory which is roughly 4 A1000s, which are pricy.

Running this on AWS with the above suggestion would cost $25/hr - just for one model running. That’s almost $0.50 a minute. If you imagine it takes a few seconds to run the model for one request… easily you’ll hit $0.05 per request once you factor in the rest of the infra (storage, CDN, etc) and the engineering cost, and the research cost, and the fact that they probably have a scale to hundreds of instances for heavy traffic and that may mean less efficient purchased servers.

OpenAI has a sweetheart deal with Azure, but this is roughly the cost structure for serving requests. And this doesn’t include the upfront cost of training.

https://alpa.ai/tutorials/opt_serving.html

4 comments

Really makes you appreciate the brain, which presumably operates with some sort of similar demand.
Hard to tell. Similar to how it takes a lot of resources for a human to hang from monkey bars but for a sloth it takes basically no resources at all, because the sloth comes out of the box designed for it.
Human babies come out of the box designed for hanging from monkey bars as well.

https://youtu.be/jXJLaGguQiU

Another mind-boggling thing about brain is how little power it uses to do all the complex things it does.
calories are a unit of energy, so it’s a straight forward comparison

if we assume that a computer can be powered by 100 watts, over a day it will use 2.4 kW h, which is about 2000 Calories

GPU will consume a lot more, but we aren’t that far off in efficiency

Doesn't that assume 100% of a human's daily calories burn is due to brain activity?
The brain uses about 20% of a human's calories. It's not 100%, but it's a substantial fraction.
The other components of the human body are also required for brain function.
The brain doesn't use a synchronous digital architecture. It is asynchronous. Spiking neural networks implemented in neuromorohic hardware are equally efficient. They consume milliwatts for a million neurons.
Do you have links on novel hardware architectures for neuromorphic hardware? In my country , the leading research group for neuromorphic computing does not cite any novel hardware approaches, only what existing hw architectures are most suitable.
How do you know that the universe isn't just rendering everything.
To have ML produce meaningful content you need tp give it some input or a sense of what the outcome should be and this is after billions of trial and errors.

Yet people these days believe something like the brain was bruteforced by nature into an accidental existence.

Some input: The organism's environment.

Outcome should be: The organism successfully produces offspring

Natural selection is doing exactly what you describe.

Except natural selection can't start over. It onlu works if there are always a high rate of survivors and even if that was not an issue consider 4 billion years and a generous generation life of one year (natural selection cycle), 4 billion isn't a whole lot even for small features when you don't have an enormous population and birth rate. Let's say there were 100000 humans at some point and only a 1000 fatal features (being generous) it's not just the replacement rate of defective humans that needs to exceed the elimination rate, a certain percent of replacements must be free of all fatal defects and survive. Also, consider how there should be many failed species that attempted to evolve into a human like species or a primate. You can't always luck out, at some point the entire branch has to fail, requiring subsequent attmepts meanwhile the fatal conditions that required the evolution will not go away.
> It onlu works if there are always a high rate of survivors

There doesn't have to be a high rate of survival if the reproductive rate compensates for losses.

E.g., if 80% of wild rabbits are eaten, but the remaining 20% can give birth to 5 bunnies per parent per lifetime, the population will be stable.

I have no idea where you're getting your beliefs, but most of it is wrong in both the math and biology.

What I am saying is that rate needs to continue to be positive and out of 20% survivors many will not carry the survival gene. And on top of that, it isn't just one thing that kills a rabbit in your example, the climate, not finding mates, predators, disease and more all must be overcome at once. Survivors must overcome a wide array of adversity and succesfully pass on that combination of abilities and this needs to happen every generation.

Look at it in bits and bytes. For each adversity overcoming feature that a species has inherited, let that a be a bit set to 1. With 2 adversaries you have only two bits where only need one out of 4 individuals that has both bits on. For a realistic adversity of 32, you need 4billion bits all set to one. And this is without considering how a survival trait against one adversity can be a fatal trait against another. Now these bits need to be passed on, if one of them is missing then the only chance that individual has to survive is by pure chance they avoid that adversary.

Think of the endless adversities we face and overcome, you are saying for millions of generations, there has been an unbroken chain of survivors that kept overcoming a geometrically expanding adversity. Just a degree increasing in the global temperature causes entire ecosystems to collapse.

Survival is the exception, not the default.

it is natural selection. this is most famous mechanism of evolution.
It's interesting that the requirements for a text model are so much greater than for images.

Stable diffusion can run on a home pc, while it seems you need a super computer for GPT3. I'm not sure that would have been my intuition.

I think it has to do with text being much more precise. Your stably diffused cartoon avatar having 6 finger is not nearly as noticeable as a language model's chat mispelling every second word. So you need less resources to get to a human acceptable result
no, diffusion models are just more efficient
Don't forget training costs, labor costs used for RLHF and (most likely) the money required for such large volumes of training data.
Doesn't ChatGPT fine-tune one of the smaller GPT-3s, not the 175B parameter model?