| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ansk 1799 days ago
	Is the prevailing opinion that progress in reinforcement learning is dependent on algorithmic advances, as opposed to simply scaling existing algorithms? If that is the case, I could see this decision as an acknowledgement that they are not well positioned to push the frontier of reinforcement learning - at least not compared to any other academic or industry lab. Where they have seen success, and the direction it seems they are consolidating their focus, is in scaling up existing algorithms with larger networks and larger datasets. Generative modeling and self supervised learning seem more amenable to this engineering-first approach, so it seems prudent for them to concentrate their efforts in these areas.

3 comments

nrmn 1799 days ago

Yes, it feels like we have squeezed most of the performance out of current algorithms and architectures. OpenAI and deepmind have thrown tremendous compute against the problem with little overall progress (overall, alpha go is special). There was a big improvement in performance by bringing in function approximators in the form of deep networks. Which as you said can scale upwards nicely with more data and compute. In my opinion as an academic in the deep RL, it feels like we are missing some fundamental pieces to get another leap forward. I am uncertain what exactly the solution is but any improvement in areas like sample efficiency, stability, or task transfer could be quite significant. Personally I’m quite excited about the vein of learning to learn.

an_opabinia 1799 days ago

> alpha go is special

The VC community is in denial about how much Go resembled a problem purpose built to be solved by deep neural networks.

TchoBeer 1799 days ago

Are you suggesting that Go literally was purpose built for this?

rwallace 1798 days ago

There is a sense in which it was: out of all the games that have ever been designed, or that it would be logically possible to design, humans selected Go as one of the relatively few to receive sustained attention, in part because it is particularly well suited to the deep neural network that is the visual cortex. So it is not a coincidence that it is also well suited to artificial deep neural networks.

an_opabinia 1798 days ago

It’s one of the few interesting games out there whose rules can be neatly represented as algebra on binary matrices and still make sense.

abeppu 1799 days ago

I think the premise of your question actually points to the real problem. In RL, b/c your current policy and actions determine what data you see next, you can't really just "scale existing algorithms" in the sense of shoving more of the same data through them on more powerful processors. There's a sequential process of acting/observing/learning which is bottlenecked on your ability to act in your environment (ie through your robot). Off-policy learning exists, but scaling up the amount of data you process from a bad initial policy doesn't really lead anywhere good.

andyxor 1799 days ago

Reinforcement learning itself is a dead-end on a road to AI. They seem to slowly starting to realize it, probably ahead of academia.

DrNuke 1799 days ago

Nope, if you see RL as just another tool for niche industrial domains? One of the targets put forward at global level is, for example, a fully automated, closed-cycle, high-throughput lab for drug discovery. More in general, fully automated factories and networks of factories (another reason why delocalization of supply chain is not being pursued anymore).

nrmn 1799 days ago

Why do you believe this to be the case?

andyxor 1799 days ago

In a nutshell it’s too wasteful in energy spent and it doesn’t even try to mimic natural cognition. As physicists say about theories hopelessly detached from reality - “it’s not even wrong”.

The achievements of RL are so dramatically oversold that it can probably be called the new snake oil.

vladTheInhaler 1799 days ago

I'm going to need you to unpack that a bit. Isn't interacting with an environment and observing the result exactly what natural cognition does? What area of machine learning do you feel is closer to how natural cognition works?

tsimionescu 1799 days ago

Adding to the other comment, it's quite clear that animals, and especially humans, act and learn based on many orders of magnitude less experiences than pure RL needs, especially when discussing higher order behaviors. We obviously have some systems that use inductive and deductive reasoning, heuristics, simplistic physical intuitions, agent modeling and other such mechanisms, that do not resemble ML at all.

I would say that it is likely, intuitively, that these systems were trained through things that look much like RL in the millions of years of evolution. But that process is obviously not getting repeated in each individual organism, who is born largely pre-trained.

And for any doubt, the poverty of the stimulus argument should put it to rest, especially when looking at simpler organisms than vertebrates, which can go from egg to functional sensing, moving, eating, predator avoiding in a matter of minutes or hours.

andyxor 1799 days ago

> What area of machine learning do you feel is closer to how natural cognition works?

None. The prevalent ideas in ML are a) "training" a model via supervised learning b) optimizing model parameters via function minimization/backpropagation/delta rule.

There is no evidence for trial & error iterative optimization in natural cognition. If you'd try to map it to cognition research the closest thing would be behaviorist theories by B.F. Skinner from 1930s. These theories of 'reward and punishment' as a primary mechanism of learning have been long discredited in cognitive psychology. It's a black-box, backwards looking view disregarding the complexity of the problem (the most thorough and influential critique of this approach was by Chomsky back in the 50s)

The ANN model that goes back to Mcculloch & Pitts paper is based on neurophysiological evidence available in 1943. The ML community largely ignores fundamental neuroscience findings discovered since (for a good overview see https://www.amazon.com/Brain-Computations-Edmund-T-Rolls/dp/... )

I don't know if it has to do with arrogance or ignorance (or both) but the way "AI" is currently developed is by inventing arbitrary model contraptions with complete disregard for constraints and inner workings of living intelligent systems, basically throwing things at the wall until something sticks, instead of learning from nature, like say physics. Saying "but we don't know much about the brain" is just being lazy.

The best description of biological constraints from computer science perspective is in Leslie Valiant work on "neuroidal model" and his book "circuits of the mind" (He is also the author of PAC learning theory influential in ML theorist circles) https://web.stanford.edu/class/cs379c/archive/2012/suggested... , https://www.amazon.com/Circuits-Mind-Leslie-G-Valiant/dp/019...

If you're really interested in intelligence I'd suggest starting with representation of time and space in the hippocampus via place cells, grid cells and time cells, which form sort of a coordinate system for navigation, in both real and abstract/conceptual spaces. This likely will have the same importance for actual AI as Cartesian coordinate system in other hard sciences. See https://www.biorxiv.org/content/10.1101/2021.02.25.432776v1 and https://www.sciencedirect.com/science/article/abs/pii/S00068...

Also see research on temporal synchronization via "phase precession", as a hint on how lower level computational primitives work in the brain https://www.sciencedirect.com/science/article/abs/pii/S00928...

And generally look into memory research in cogsci and neuro, learning & memory are highly intertwined in natural cognition and you can't really talk about learning before understanding lower level memory organization, formation and representational "data structures". Here are a few good memory labs to seed your firehose

https://twitter.com/MemoryLab

https://twitter.com/WiringTheBrain

https://twitter.com/TexasMemory

https://twitter.com/ptoncompmemlab

https://twitter.com/doellerlab

https://twitter.com/behrenstimb

https://twitter.com/neurojosh

https://twitter.com/MillerLabMIT

unishark 1798 days ago

The place/grid/etc cells fall generally under the topic of cognitive mapping. And people have certainly tried to use it in A.I. over the decades, including recently when the neuroscience won the Nobel prize. But in the niches where it's an obvious thing to try, if you can't even beat ancient ideas like Kalman and particle filters, people give up and move on. Jobs where you make models that don't do better at anything except to show interesting behavior are computational neuroscience jobs, not machine learning, and are probably just as rare as any other theoretical science research position.

There is a niche of people trying to combine cognitive mapping with RL, or indeed arguing that old RL methods are actually implemented in the brain. But it looks like they don't much benefit to show in applications for it. They seem to have no shortage of labor or collaborators at their disposal to attempt and test models. It certainly must be immensely simpler than rat experiments.

Having said that, yes I do believe that progress can come considering how nature accomplish the solution and what major components we are still missing. But common-sense-driven tacking them on there has certainly been tried.

sillysaurusx 1799 days ago

For what it’s worth, I agree with this take. But I think RL isn’t completely orthogonal to the ideas here.

The missing component is memory. Once models have memory at runtime — once we get rid of the training/inference separation - they’ll be much more useful.

bobberkarl 1799 days ago

just to say this is the kind of answer that makes HN an oasis on the internet.

kirill5pol 1799 days ago

Maybe true if you consider policy gradient methods and Q learning the only things that exist in RL… it’s a pretty wide field that encompasses a lot more than the stuff OpenAI puts out.

TylerLives 1799 days ago

What's the alternative?

andyxor 1799 days ago

https://news.ycombinator.com/item?id=27869511