| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ShamelessC 1067 days ago

You’re missing my original point which is about continued, ongoing robustness that works in the low data regime and allows pilots/astronauts to make reasonable decisions in _completely novel_ situations (as just one example).

The networks we have are trained once and work efficiently for their training dataset. They are even robust to outliers in the distribution of that dataset. But they aren’t robust to surprises, unmentioned changes in assumptions/rules/patterns about the data.

Even reinforcement learning is still struggling with this as self-play effectively requires being able to run your dataset/simulation quickly enough to find new policies that work. Humans don’t even have time for that, much less access to a full running simulation of their environments. Although we do generate world models and there’s some work towards that I believe.

Again happy to be corrected.