| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 343rwerfd 742 days ago

> If an LLM was capable of logical reasoning

the prompt interfaces + smartphone apps were (from the beginning), and are ongoing training for the next iteration, they provide massive RLHF for further improvements in already quite RLHFed advanced models.

Whatever tokens they're extracting from all the interactions, the most valuable are those from metadata, like "correct answer in one shot", or "correct answer in three shots".

The inputs and potentially the outputs can be gibberish, but the metadata can be mostly accurate given some implicit/explicit (the tumbs up, the "thanks" answers from users, maybe), human feedback.

The RLHF refinement extracted from getting the models face the entire human population for to be continuously, 24x7x365, prompted in all languages, about all the topics interesting for the human society, must be incredible. If you just can extract a single percentage of definitely "correct answers" from the total prompts answered, it should be massive compared to just a few thousands of QA dedicated RLHF people working on the models in the initial iterations of training.

That was GPT2,3,4, initial iterations of the training. Having the models been evolved to more powerful (mathematical) entities, you can use them to train the next models. Like is almost certainly happening.

My bet is that one of two

- The scaling thing is working spectacularly, they've seen linear improvement in blue/green deployments across the world + realtime RLHF, and maybe it is going a bit slow, but the improvements justify just a bit more waiting to get trained a more powerful,refined model, incredible more better answers from even the previous datasets used (now more deeply inquired by the new models and the new massive RLHF data), if in a year they have a 20x GPT4, Claude, Gemini, whatever, they could be "jumping" to the next 40x GPT4, Claude, Gemini, a lot faster, if they have the most popular, prompted model in the market (in the world).

- The scaling stuff already sunk, they have seen the numbers and it doesn't add by now, or they've seen disminished returns coming. This is being firmly denied by anyone on the record or off the record.