| There might be some truth in what you say for very large image and language models that use supervised learning. It is really hard to see how this 'it's just a lot of good data' view applies to deep reinforcement learning where the model learns multi step policies from raw input data (e.g a camera on a robot) with only a rough high level reward function to guide it. If therefore (as seems to be the case) you can abstract the information humans need to provide to the model/learning system to ever high levels of reward function (and thereby vastly reduce the information provided by humans) then it seems very hard to argue that the model (and the training process) isn't doing to some degree what you describe as: 'incredible amounts of experimental work to carve-the-world along its joints, ie., to have the right concepts; and incredible amounts of work to measure along its joints, ie., to have the right units. And then to eliminate all the coincidences and irrelevances.' For example, imagine a robot learning from scratch to pick objects up based on raw pixel data with only a scalar reward function - where in this process is the human preparing the data so the model only has to average? |
Great -- so do you have an example of such a system?
I'd be inclined, initially, to deny that it exists. If your reward function expresses a reward for the goal of "picking up objects with (pixel-space) properties etc.", you're cheating. In this case, the reward function serves the role of the data: ie., prepared by us to work. Indeed, a function is just a dataset -- and the reward function here is being sampled by the system.
You'd need to show me a system whose reward function / dataset didnt "contain the solution", in the manner of animals who respond to the world without already having all the information about it.
The relevant capacity a system needs to have, in both cases, is being able to take a profoundly ambiguous environment and produce a dataset/reward-fn which "carves along its joints". Ie., which effectively eliminates that ambiguity.
When such ambiguity & coincidence is eliminated, there's basically nothing left to do -- it's that basic nothing which we task machines with doing. Ie. running `mean(sample(unambigious relevant well-carved data))`.
You'll note its the *properties* of the data which express intelligence & learning.