| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by readitalready 120 days ago
	Pretraining + RL itself is the scaling limit. If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity. It has no concept of physics, or time even. AGI happens when you DON'T need to scale pertaining + RL.

2 comments

> If you feed it the entire dataset before 1905, LLMs aren't going to come up with general relativity.

Link?

You don't need a source for that, an LLM with such little data is barely able to form proper sentences.

> an LLM with such little data

There is a mountain of data pre-1905. Certainly enough to train a decent 30B parameter model.

Now, digitizing & OCRing all of that data... THAT is a challenge.

AGI maybe not, but it is reaching disruption level intelligence in the SWE domain.