| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jah242 1528 days ago

This would be one example from Deepmind using raw pixel input to stack objects. This has a relatively detailed reward function (but is also a very complicated task) - https://arxiv.org/abs/2110.06192

There are other examples from OpenAI a while back using even just sparse rewards (i.e binary 1, 0 for success or failure over the whole task) - but these weren't pixel input if I remember correctly - https://openai.com/blog/ingredients-for-robotics-research/

I m afraid if you think providing any reward function is cheating then we have fundamentally different views of what AI/ML even means/involves. It appears humans and likely all animals have largely pre-programmed reward functions developed over billions of years of evolution (pain is bad, food is good, etc.). These reward functions are ultimately what underpin what we are trying to do, what outcomes are good/bad, to what degree we 'want' to explore vs exploit. The idea that human and animal 'intelligence' is born as a blank slate with nothing to guide it and no reward function to maximise doesn't seem to bear any resemblance to reality.

The only difference between a reward function that tells a robot 'you need to stack these objects but I m not going to tell you where in 3D space the objects are or where they need to go to stacked or the shapes/forces involved' and an animal that is born with a reward function that says 'you need to find food and shelter but I m not going to tell you how to collect the food or where to find shelter' is the level of abstraction. Fundamentally they appear the same.

You are pulling a sleight of hand when you suggest 'in the manner of animals who respond to the world without already having all the information about it' - there is a vast difference between an abstract reward function (which humans and animals also have) and 'having all the information about [the world]'.