| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bonoboTP 99 days ago
	Ok, maybe pretraining is now complete and solved. Next up: post-training, reinforcement learning, engineering RL environments for realistic problem solving, recording data online during use, then offline simulation of how it could have gone better and faster, distilling that into the next model etc. etc. There's still decades worth of progress to be made this way.

1 comments

k32k 99 days ago

" There's still decades worth of progress to be made this way."

That's not true. Moreover the progress can slow to a crawl where it's barely noticeable. And in that world the humans continues to stay ahead - that's the magic of humans. To be aware of surroundings and adapt sufficiently whilst taking advantage of tools and leveraging them.

link

linkregister 98 days ago

This is an interesting theoretical statement that does not survive a collision with reality. The long-tail expert RHLF training is effective. We have seen significant employment impact to call center employees. This does not mean its progress will be cheap or immediate.

link