Hacker News new | ask | show | jobs
by tossandthrow 539 days ago
I do think it is time to start questioning whether the utility of ai solely can be reduced to the quality of the training data.

This might be a dogma that needs to die.

2 comments

If not bad training data shouldn’t be problem
There can be more than one problem. The history of computing (or even just the history of AI) is full of things that worked better and better right until they hit a wall. We get diminishing returns adding more and more training data. It’s really not hard to imagine a series of breakthroughs bringing us way ahead of LLMs.
I tried. I don't have the time to formulate and scrutinise adequate arguments, though.

Do you? Anything anywhere you could point me to?

The algorithms live entirely off the training data. They consistently fail to "abduct" (inference) beyond any language-in/of-the-training-specific information.

The best way to predict the next word is to accurately model the underlying system that is being described.
It is a gradual thing. Presumably the models are inferring things on runtime that was not a part of their training data.

Anyhow, philosophically speaking you are also only exposed to what your senses pick up, but presumably you are able to infer things?

As written: this is a dogma that stems from a limited understanding of what algorithmic processes are and the insistence that emergence can not happen from algorithmic systems.