Hacker News new | ask | show | jobs
by atty 1433 days ago
Yes, your explanation is essentially correct. There is work done in the areas you’re talking about - essentially forcing models to more explicitly learn “concepts” - and in very large language models that seems to be emerging naturally. But current vision models would almost certainly break when trying to identify a vehicle from the bottom shot if it had never seen a vehicles undercarriage during training. Current vision models are capable of identifying vehicles from arbitrary angles (when viewed from the side/head on) and in arbitrary shades/colors/models/etc, and that’s about the amount of extrapolation we’d be talking about.
1 comments

I don’t think your premise is correct. The holy grail of such systems - Human Intelligence- will also break similarly if it’s asked to identify a car from an undercarriage when the human subject has never ever seen an undercarriage. We really forget how much data humans are able to expose themselves to in their formative years. I’d often bend down to fetch my ball that had accidentally slid under a parked car and that’s how I learnt about the look of undercarriages.