| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Oxidation 1255 days ago

> actually it is a much harder problem than everything they've done so far.

Impressive as it is, this kind of AI seems to be still working under what seems to me to be a possibly-flawed premise: training quantity has a sufficient quality of its own.

I can't prove that it's impossible with a clever enough system, but I simply don't see how you can get a right answer to come out of a statistical system that's been trained in input that might contain incorrect information, conflicting versions of it or just nothing at all, in which case it just makes something statistically plausible.

For example, it can give quite good answers about a well known event (e.g. a big earthquake), presumably because there are enough mentions of it in the training data. Ask about a footnote earthquake with few mentions and it will invent details that could be right, but aren't. For example a magnitude in the single digits "seems about right" and passes a sniff test, but has no factual basis in reality.

That said, I wonder if welding a large structured data store like Wolfram Alpha or Wikidata to the language model might resolve that issue: don't rely on statistics when the answer exists.