It's entirely feasible that you don't need ridiculous amounts of data to generate an AI - that's just the approach being taken by the vast majority of research teams.
Modern machine-learning/deep-learning takes a bunch of data and uses high-dimension, more-or-less brute-force methods to approximate that data with a curve. It works good often (seldom works "great" 'cause the data can't fully capture the situation).
The appealing thing about this is the programmer doesn't have to understand anything. If you have little data, approximation just isn't going to capture the situation. Either the programmer gets an understanding of the system (extremely costs and time-consuming) or we create systems that are themselves capable of this understanding. But no one knows how to do this, all the "artificial intelligence" victories anyone has observed have come from throwing computing power at a problem. Maybe someone will figure out how to throw computing power at the general problem of understanding but I'm doubtful.
That's totally fair. I'm not a AI researcher. From what I've heard from internal folks. A new competitor might be able to compete on one or two models to create some niche, but not in the entire market. The market is big enough that the competitors can still be viable companies, but we are not resting on our laurels either, so it'll be interesting to see.
The appealing thing about this is the programmer doesn't have to understand anything. If you have little data, approximation just isn't going to capture the situation. Either the programmer gets an understanding of the system (extremely costs and time-consuming) or we create systems that are themselves capable of this understanding. But no one knows how to do this, all the "artificial intelligence" victories anyone has observed have come from throwing computing power at a problem. Maybe someone will figure out how to throw computing power at the general problem of understanding but I'm doubtful.