Hacker News new | ask | show | jobs
by liquid_thyme 64 days ago
The amount of private data that is locked up inside private internal databases is huge. This is especially true of regulated industries. There is a wealth of data - financial data showing how to budget for things, pricing data on various products that are B2B, standard operating procedures at mature companies that have gone through various revisions, designs for manufacturing plants so people don't keep reinventing and making the same mistakes again, and on and on.
1 comments

I think it's implied that they're not talking about private data when they say they've run out.
fair. I want to +1 the fact that there is a large amount of data unseen by LLMs.
I think there are post training tweaks that can be done with corporate data to help fit an AI to a specific corporation. But I don’t think that private data will deliver us AGI. The knowledge for AGI is out in the world, not hidden inside corporations. Private data brings us knowledge of the XYZ project status and the division ABC budget and whether Bob wants a chocolate cake for his going away dinner or not.
I'm not seeing it the same way. Businesses in various industries have several types of moats - money, knowledge, experience, skills, etc. There is ton of competitive intelligence hidden in private data.

Its one of the reasons you can't use chatGPT and start manufacturing chips or vaccines, or anti-cancer medication. The gap between publicly available data that informs academic "core science" research versus specific product-based knowledge that shows you how to make a successful drug candidate that can withstand regulatory scrutiny or be a safe and effective drug for the worlds population.

We could iterate so quickly if this private data set was democratized.