| HN Mirror

Companies do have a lot of data, and some of that data might be useful for training AI. but >99% isn't. When companies do release a cool model or paper that doesn't have open data, (as you point out for competitive or other reasons privacy etc) people can then help build/collect similar open datasets. Unfortunately companies generally don't owe you their data, and if they are in the business of making models they probably won't share the model either, the situation is similar to source code for proprietary LoB applications. but fortunately the best AI researchers mostly do like to share their knowledge and because companies want to attract the best AI researchers they seem to generally allow researchers to publish if its not too commercially sensitive. It could be worse while the competitive situation has reduced some visibility of the cutting edge science, lots of datasets and papers are still published.