| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by akelly 1284 days ago

The way they went from GPT-3 to ChatGPT is really quite genius. My understanding is that it's something like this:

1. Start with GPT-3, which predicts the next word in some text and is trained on all the text on the internet

2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

3. The GPT model needs a massive amount of training data, it would be cost prohibitive to get enough human feedback to fine tune GPT manually. So you train another model, called the reward model, to predict how the humans will rate each response. Then you train the GPT model against the reward model millions of times

5. Feed a small percentage of the output from that training process back to the human reviewers to continue training the reward model, based on heuristics like reward model uncertainty which predict how helpful the human feedback will be towards improving the reward model

6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model

https://openai.com/blog/chatgpt/

https://openai.com/blog/deep-reinforcement-learning-from-hum...

11 comments

visarga 1284 days ago

> 2. Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

Step 2 is not that. It's manually writing responses for a few tasks.

> A labeller demonstrates the desired output behavior.

(left side on https://cdn.openai.com/chatgpt/draft-20221129c/ChatGPT_Diagr...)

So it is supervised training in this stage. Ranking is the next stage, for training the reward model. This is not the reward model, it's a model to generate sample responses to be used by the reward model.

So there are two kinds of manual work involved here - manually demonstrating how to solve tasks, and ranking responses. There is even talk about how much effort to invest in the first vs the second and what is the trade-off.

akelly 1283 days ago

Right I intentionally left off Step 1 from that chart to simplify the explanation, since it didn't seem necessary. Is Step 1 just for creating the ChatGPT content blocker?

anthropodie 1284 days ago

I want to know if it will ever be possible to run this kind of AI at home once its training is complete. I dont need all the knowledge just subset that I'm interested in.

Actually I'm more interested in its ability to transform things. For example I can ask it to convert docker-compose to docker run command, it can manipulate JSON, it can sort numbers in table when prompted. I'm more interested in these abilities rather than just getting answers for which I already have Google

DeWilde 1283 days ago

It uses GPT-3 under the hood which requires about 350 gigabytes of GPU VRAM (back of the envelope calc, likely more) to perform these inferences.

bozhark 1283 days ago

7x NVIDIA RTX A6000's so ~$32,550

KaoruAoiShiho 1283 days ago

This is honestly affordable for a lot of upper-middle class people and might well it worth it. It's like the cost of a car. I can seriously see this writing a book for me if I can get it tuned to study only my writing style and remember all of my texts. But it could also only cost $14000 14x RTX 3090s.

heartbreak 1283 days ago

I’ll wait a year and buy $2k worth of hardware that runs it.

llampx 1275 days ago

Difference is in first mover advantage. If you can be the first to use it to bring value to yourself and your clients, you can easily make up the cost of that hardware.

jasmer 1278 days ago

Or wait a further few years and spend $20.

Or a couple years later for it to be 20 cents.

Or a couple years later for them to give it to you for buying a bottle of Coke.

In the interim, they will find ways to make money from us.

bozhark 1279 days ago

5x limit per manufacturer, maybe 3rd party for the additional?

KaoruAoiShiho 1283 days ago

If this was open sourced it may be quickly optimized, the amount of VRAM required for image generation went down very quickly, I'm sure Dalle-2 is still using enormous vrams but other solutions are not.

lmarcos 1283 days ago

So maybe in 5 years we would be able to run it in our $800 smart glasses.

acapybara 1283 days ago

Yeah buddy!

Look up fine tuning GPT-J in 8 bit mode.

People have made domain-specific models that perform well (IIRC, better than GPT-3 in their domain).

The team behind Stable Diffusion is also working on one that's supposed to be pretty good.

hackernewds 1283 days ago

you can do that today in the free release?

xvector 1283 days ago

I think he wants to self host. It sucks to have no ownership of such a powerful tool I would pay upwards of $3000 to be able to self host something like this.

gillesjacobs 1283 days ago

Rest assured someone is working on a self-hosted (distilled) model. Stable Diffusion has shown there is a viable market for open, consumer-hardware inferencable models.

nsb1 1283 days ago

You forgot the fresh cup of really hot tea :)

https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D...

dzink 1284 days ago

ChatGPT seems to be/result in some amount of caching of responses - there is very little variation when to asking the same question multiple times. CharacterAI produces a lot more variety in comparison, making it more helpful for brainstorming. That said ChatGPT is likely closer to the truth, even if not perfect, for searches. The innovation happening lately is incredible.

ravi-delia 1284 days ago

There's definitely some live pruning happening, but another factor is that the temperature is turned way down. Obviously at a low temp it's just a totally deterministic function, and if it's doing it's job you'd hope that similar questions would be mapped very close together in the configuration space

djmips 1283 days ago

Did you try to vary your question or add modifiers or elaborate?

ofrzeta 1284 days ago

> Take thousands of prompts, generate several responses for each of them, and have human reviewers rank the responses for each prompt from best to worst

Recently I saw an image where Indian women sat in front of computers and the caption said they were classifying "AI" responses. I guess that's true and this kind of work is the new outsourced cheap labour in the AI age.

dotancohen 1284 days ago

That Indian woman's idea of acceptable and not acceptable AI responses surely vary from that of a San Fransisco tech worker, or Cape Town motorcycle mechanic, or an English teacher from Liverpool.

I really doubt the mechanical turk method is applicable or even useful for the current state of AI-generated text.

xp84 1274 days ago

i actually disagree a lot with this. Sure, if you asked something with heavy cultural baggage that would frequently be a real concern, but when you are primarily trying to bridge the machine-human chasm, our cultural differences among the examples you gave are trivial in comparison. For instance, if you offered an AI personal assistant but the catch was that it would (at least starting out) only have the perspective of an average middle-class Indian person, it would still beat the absolute crap out of "first generation" technology like Siri or Alexa!

weird-eye-issue 1283 days ago

It could be a first pass

hcks 1283 days ago

I personally worked as a « human trainer » for the fine tuning of ChatGPT.

The pay was 50$ per hour, which is not bad for a side job as a student.

jacknews 1279 days ago

I'd say. Where do you apply for this kind of work?

mclightning 1284 days ago

That's super interesting. When GPT-3 came out, I wrote an article inspired by it. That we could one day build an AI that acts like AGI, by a crazy vast amount of multimedia training data, collected by willing users to participate in ever improving AI interactions;

https://medium.com/swlh/bicameral-mind-humanoid-robot-with-g...

tluyben2 1284 days ago

When I was a first year AI student beginning of the 90s I asked my professor what would happen if we just made a massive neural network and trained it with all information in the world. He said it cannot happen as it it impossible.

namaria 1284 days ago

This reminds me of a major US newspaper declaring heavier-than-air flying machines a million years away mere months before the Wright brothers experiments.

snowwrestler 1283 days ago

It is impossible; the GPT family of AIs is trained on text, which is a tiny subset of all the information in the world.

The “unphysical” nature of its training is one reason it can so easily “hallucinate” about impossible things as though they were real.

Epa095 1283 days ago

If an elderly but distinguished scientist says that something is possible, he is almost certainly right; but if he says that it is impossible, he is very probably wrong.

Arthur C. Clarke

samus 1283 days ago

Considering the computational resources available at the time, he was not that wrong. Research into artificial neuronal networks has always been held back by available computational power.

tluyben2 1283 days ago

Agreed, but as a professor I believe one needs to be looking in the future. It was not that far out, but yeah it was an AI winter. We were stuck until 2012 basically. That's a long time.

Al-Khwarizmi 1283 days ago

Easier said than done.

I'm a professor in an AI field, and I can tell you that neither myself nor the colleagues I regularly have scientific discussions with could imagine ten years ago that something like ChatGPT would be possible in 2022. I suppose there might be a minority who called it, but recent advances in deep learning absolutely whooshed past the predictions of the overwhelming majority of people in the field.

tluyben2 1283 days ago

Ah yes, that's what I was trying to say. I was the worst sceptic of AI; I never did anything with AI with it after getting my masters. I just went for money, programming and managing programming.

For me [1] this is the most mindboggling thing I have seen in my life and I don't think people realise what it means. And yes, it wooshed passed anything I thought possible in my lifetime. I hate that it's 'not be evil', 'anti thought crime' etc but it is really incredible what it does.

[0] https://twitter.com/luyben/status/1600663169353015297 [1] https://brainfisheatfishbrain.com/about/

soulofmischief 1284 days ago

I don't mean to downplay how incredible the tech is, but I'm not sure I'd call this approach genius as it's the industry standard.

sillysaurusx 1284 days ago

Then why were they the first ones to exploit it so effectively?

I don’t think it was standard for GPT models.

411111111111111 1284 days ago

I think the issue here was with the term genius, which makes it sounds like what was a completely new paradigme and revolutionary.

OpenAIs success mainly stems from extremely well executed previous concepts while mostly ignoring cost. And as they're pretty much the most successful public player in this domain, they've got the first-mover advantage which they're currently very succesfully leveraging. At least thats how it looks from the perspecitve of an armchair analysts, which wouldn't have been able to achieve the same -- even if I had the same resources and time.

The actual result is absolutely incredible however, regardless wherever the road to this end was genius or not

gentoo 1284 days ago

I don't think anything about high-performance GPT models is standard, since they are only a couple years old and only a handful of organizations have developed them

soulofmischief 1283 days ago

The technique in question has little to do with GPT itself; it involves using ML to generate more training data in an automated fashion, creating a generative training loop, which as another commenter mentioned, is also the basis behind general adversarial networks.

mdp2021 1284 days ago

Compare it to Generative Adversarial Networks. (There are parallels - pun not initially intended.)

kubrickslair 1284 days ago

Yes, some variation of this iterative approach has been around for over 2 decades. Look at Oren Etzioni's early work on Open Information Extraction and so on.

I agree that applying this semi supervised approach to extend GPT for improving dialog models is what is unique & results are stunning. But the meta method has been around for a while.

starik36 1284 days ago

Ok, but how does it take my code and fixes the bug? It's my own code, no one has seen it besides me and the model wasn't trained on it.

ravi-delia 1284 days ago

In learning to predict the next token, the model has to pick up lots of little bits of world knowledge. I'm sure someone would disagree with the phrasing of "understand", but it certainly operates with more complexity than, say, a markov chain. It has seen lots of python, and in order to predict better, it has developed internal models of how python works. Think of how much better you'd do predicting the next character of python code compared to random noise- there's a lot of structure there.

zorr 1284 days ago

In my (limited) experience it seems to perform even better for typed languages (for example Kotlin/Java/Swift) compared to Python. The Python code it provided often had subtle type issues when working with dates. While the Kotlin date-related code it provided was more accurate and correct in terms of types. Which makes sense since the additional type information likely leads to a much better "internal model of how Kotlin works"

What surprised me was the level of "understanding" it seems to do when providing it with some of my own sample code. It can analyze the code, explain how it works/what it does, use libraries, suggest improvements and apply those improvements.

Have a look at this conversation: https://imgur.com/a/ZtViC3d

While the end result isn't perfect, it's still highly impressive and while I was an AI-skeptic before, I now see the possible benefits of AI assistants for programming.

Some other prompts with very impressive results:

* "Write an implementation for the following Kotlin repository interface: <insert-interface-with-full-type-signatures>."

* (followup) "Add save/load methods that store the backing map in a JSON file"

* (followup) "Replace Gson with Jackson for JSON serialization"

* "Write an Android layout xml for a login form with username/password/loginbutton"

* (followup) "Provide the Kotlin activity code for this layout"

* "Write a Kotlin function that parses a semver input string into a data class"

FiberBundle 1284 days ago

> zorr 1 hour ago | root | parent | next [–]

In my (limited) experience it seems to perform even better for typed languages (for example Kotlin/Java/Swift) compared to Python. The Python code it provided often had subtle type issues when working with dates. While the Kotlin date-related code it provided was more accurate and correct in terms of types. Which makes sense since the additional type information likely leads to a much better "internal model of how Kotlin works"

I think another possibility here is that they might have used an execution environment to check whether the code the model came up with actually compiles and used that as additional input during training. Some sort of execution environment seems to me to also be a possible explanation for how they managed the model to emulate a terminal so well.

jameshart 1283 days ago

It’s not ‘more complexity’ than a Markov chain - it essentially is a Markov chain, just looking at a really deep sequence of preceding tokens to decide the probabilities for what comes next.

And it’s not just looking that up in a state machine, it’s ‘calculating’ it based on weights.

But in terms of ‘take sequence of input tokens; use them to decide probable next token’, it’s functionally indistinguishable from a Markov chain.

ravi-delia 1283 days ago

I look at deep sequences of tokens and predict what comes next- can you milk me? Once you've broadened "basically a markov chain" to "any function from a sequence of tokens to a probability distribution of tokens" there's a lot of explanatory power lost. If you had to characterize the difference between brute force mappings based on pure frequencies and model which selectively calculates probabilities based on underlying structure, wouldn't you say the latter had more complexity?

You don't have to believe the hype, but if you think you can get GPT performance out of anything remotely resembling a markov chain, I encourage you to try.

jameshart 1283 days ago

There's nothing about Markov chains that says the model has to be based on brute calculation from previously observed frequencies. The point is that the exact behavior of these LLMs could also be modeled as a Markov chain with a sufficiently massive state machine.

Obviously that's impractical and not how LLMs actually work - they derive the transition probabilities for a state from the input, rather than having it pre-baked - but I think from the point of view of saying 'these are more sophisticated than a Markov chain', actually strictly speaking they aren't - they are in fact a lossy compression of a Markov model.

krackers 1283 days ago

But it seems like the attention mechanism fundamentally isn't markov-like in that at a given position it can pool information from all other positions. So as in the simplest case when trained on masked-language modeling, the prediction of the mask in "Capital of [MASK] is Paris" can depend bidirectionally on all surrounding context. While I guess it's true that in the case where the mask is at the end (for next-token completion), you could consider this as a markov model with each state being the max attention window (2048 tokens I think?), but that's like saying all real-world computers are FSMs: it's technically true, but this isn't the best model to use for actually understanding its behavior.

Since for most inputs that are smaller than the max token length you never actually end up using the markov-ness, calling it a markov model seems like it's just in a way saying it's a function that provides a probability distribution for the next token given the previous tokens. Which just pushes the question back onto how that function is defined.

jarenmf 1284 days ago

It's really awesome how good it is in modeling certain world knowledge. It seems to be struggling with putting everything in one framework. For example, it still makes a lot of mathematics and logic errors.

hugh-avherald 1284 days ago

How do you fix a bug? You've never seen it before.

spion 1283 days ago

This is why this paper was so exciting when it came out https://arxiv.org/abs/2005.14165

Make a large enough model and train it with all sorts of data and it will be able to encode generalized concepts which can then be applied to specific tasks (given only a few examples of the task, or even just a query / question, rather than an example)

tluyben2 1284 days ago

It often does[0], I am doing more experiments currently by writing the fixed code back in place and running tests automatically.

[0] https://brainfisheatfishbrain.com/post/chatgpt-code-reviews/

tnzk 1284 days ago

> 6. Release ChatGPT to the public, and use user feedback like response upvotes/downvotes to further optimize the reward model, while continuing to train ChatGPT against the reward model

Can someone provide a pointer to an article that elaborate this part?

hanniabu 1284 days ago

How does step 1 work? It seems incredibly inefficient to check your word combo against every single segment of text they have. How does it do this efficiently?

seydor 1284 days ago

https://en.wikipedia.org/wiki/Transformer_(machine_learning_...

amelius 1283 days ago

That's not genius, that's called unsupervised learning and it is an entire subfield.

dsr3 1283 days ago

I think number 3 is a description of Generative Adversarial Network (GAN).

amelius 1283 days ago

Ok, regardless, it is not really new. ML researchers are doing these kinds of things all the time.

By the way, according to some people, GANs are also a kind of unsupervised learning: https://stackoverflow.com/questions/44445778/are-gans-unsupe...