Hacker News new | ask | show | jobs
by sbierwagen 1777 days ago
>In my mind, understanding a thing means you can justify an answer.

Sure, but how does that work with superhuman AI? Consider some kind of math bot that proves theorems about formal systems which are just flat out too large to fit into human working memory. Even if it could explain its answers, there would just be too many moving parts to keep in your head at once.

We already see something this in quant funds. The stock trading robot finds a price signal, and trades on it. You can look at it, but it's nonsensical: if rainfall in the Amazon basin is above this amount, and cobalt price is below this amount, then buy municipal bonds in Topeka. The price signal is durable and casual. If you could hold the entire global economy in your head, you could see the chain of actions that produce the effect, but your brain isn't that big.

Or you just take it on faith. Why do bond prices in Topeka go up, but not in Wichita? "It just does." Okay, then what was the point of the explanation? A machine can't justify something you physically don't have enough neurons to comprehend.

4 comments

It's not about us being able to interpret answer or justification, but the reasoner's ability to justify. If a superhuman AI can justify its answers in terms of first order logic, for example, it could be defined as understanding the answers with respect to FOL. Whether we as humans are able to check whether this specific bot in fact meets that definition is a separate empirical question.

If that quant algo you mentioned just says "it'll go up tomorrow" that's different than "it'll go up tomorrow" with an attached "it's positively correlated with Y, which is up today" which is different from a full causal DAG model of the world attached, which is again different from those same things expressible in english. But again, those are definitions, which are separate from our ability to check whether they're met.

Luckily, we're not in the realm of bots spitting out unfeasible to check proofs, except for a few niche areas like theorem proving (e.g. four color theorem). For language models like in the article, the best I'm aware of is finding relevant passages to an answer and classifying entailments.

> A machine can't justify something you physically don't have enough neurons to comprehend.

We can't always verify its justification, but it either can or can't justify an answer with respect to a given justification system.

Also, you should note the memory and capabilities required to reach a conclusion might be much greater than to show it's true. Showing a needle may be easy, finding it in the haystack very hard. In this sense the hope for explainability is expanded. But still, I guess the real world is really messy "the full explanation" may be too large -- like when you explain a human intuition, the "full explanation" might have been your entire brain, your entire set of experiences up to that point; yet we can give partial explanations that should be satisfactory

A have a hypothesis that inevitably, reasoning needs to 'funnel' through explicit, logical representations (like we do with mathematics, language, etc.) to occur effectively. Or at least (quasi-)formalization is an important element of reasoning. This formal subset can be communicated.

> Even if it could explain its answers, there would just be too many moving parts to keep in your head at once.

While this is possible in practice, consider the (universal) Turing machine principle: in principle, you can simulate any system given enough memory; we may not have it our brains, but we have pen and paper or simply digital text scratchpad (both of which we use extensively in our lives).

We build another system we fully understand that can process the justification and see if it is correct/makes sense.