Hacker News new | ask | show | jobs
by tysam_and 1138 days ago
Reasoning and hallucinating are terms that are more shallow that are oftentimes used in discussions of this topic, but ultimately don't cover where and how the model is fitting the underlying manifold of the data -- which is in fact described by information theory rather well. That's why I referenced Shannon entropy, which is important as an interpretive framework. It provides mathematical guarantees and ties nicely into the other information compressive measures which do I feel answer some of the queries you're noting seem more ambiguous to you.

That is the trouble with mixing inductive reasoning sometimes with a problem that has mathematical roots. There are degrees where it's intractable to easily measure how much something is happening, but we have a clean mathematical framework that answers these questions well, so using it can be helpful.

The easiest example of yours that I can tie back to the math is the arithmetic in the structure of language. You can use information theory to show this pretty easily, you might appreciate looking into Kolmogorov complexity as a fun side topic. I'm still learning it (heck, any of these topics goes a mile deep), but it's been useful.

Reasoning on the other hand I find to be a much harder topic, in terms of measuring it. It can be learned, like any other piece of information.

If I could recommend any piece of literature for this, I feel like you would appreciate this most to start diving into some of the meat of this. It's a crazy cool field of study, and this paper in particular is quite accessible and friendly to most backgrounds: https://arxiv.org/abs/2304.12482