Hacker News new | ask | show | jobs
by nathan_compton 1282 days ago
It can reproduce a statistically plausible paragraph, certainly. But there is a great deal more to research than producing statistically plausible paragraphs. It doesn't _understand_ anything!

I've actually worked on a project where there have been attempts to use GPT like models to summarize scientific results and the problem is it gets shit wrong all the time! You have to be an expert to separate the wheat from the chaff. It operates like a mendacious search engine pretending to be a person.

1 comments

The problem is that we need to pair generative models with verification systems. We have the models, but no verification yet. Fortunately code and math are easier to verify. Some things require simulation. In other cases you can substitute an ensemble of solutions & picking the most frequent answer as consistency based verification. But for each domain we need to create verifiers and that will take some time.

The good thing is that we'll be able to generate training data with our models by filtering the junk with the verifiers. Then we can retrain the models. It's important because we are getting to the limit of available training data. We need to generate more data, but it's worthless unless we verify it. If we succeed we can train GPT-5. Human data will be just 1%, the race is on to generate the master dataset of the future. I read in a recent paper that such a method was used to improve text captions in the LAION dataset. https://laion.ai/blog/laion-5b/

>we need to pair generative models with verification systems >code and math are easier to verify

I would love to see a two-stage pipeline using a LLM to convert natural language specifications into formal specifications for something like Dafny, and then follow up with another model like AlphaZero that would generate code & assertions to help the verifier. This seems like something that a major group like DeepMind or OpenAI could pull off in a few years.