open/free data sources are likely to become very important. AI hasn't yet been super-important in the open data world, but I'd expect it to gain a lot of prominence as time goes by.
Starting a data set company would probably be a good idea. Necessarily has some humans labeling them, but you could probably build a lot of tools around it to make it as smooth as possible. Also, task rabbit and Amazon turk workers could be used.
Yep, open data and models with state-of-the-art performance are popping up more and more. I expect companies to appear which will sell data and models as a service, too.