| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Diogenesian 22 hours ago
	These proof checkers all have bugs, every single one, and since AI is still 100% incapable of understanding simple mathematics we should assume agents are likely to cheat by exploiting a kernel bug. So a human really does have to be able to read and understand the proof. There's no difference between blindly trusting Lean and blindly trusting Grigori Perelman: yes you can be reasonably confident the proof is correct. But you gotta check. This future of "the AI built the compiler and it's totally incomprehensible spaghetti, but don't worry, it verified the compiler works using this AI-generated proof assistant whose codebase is also pure spaghetti" terrifies me.

1 comments

bckgrndrdtn 19 hours ago

> These proof checkers all have bugs, every single one

Please show me a proof of `False` in Lean.

link

Diogenesian 17 hours ago

That's not what I mean. I am talking about quirks and bugs that really are pretty subtle, and which really might turn up in e.g. a verified systems programming context: https://github.com/James-Hanson/junk-theorems-in-lean/blob/m... (there is a proof of 0=1 at the end - it is easy to understand where it comes from, but low-level compiler stuff like this is always a possibility.)

LLM agents will a) discover dumb counterintuitive stuff like this and b) exploit it to satisfy the kernel with c) the questionable lines being buried behind millions of lines of math slop. Humans have to check this stuff.

link

cdetrio 16 hours ago

I'd be very curious whether such junk lean code could be used to solve Kevin Buzzard's Jacobian challenge: https://gist.github.com/kbuzzard/778bc714030b3e974ab5f403878...

The idea behind the challenge to provide a few lines of human-written/checked scaffolding that defines an "API surface". The API surface is resistant to junk/slop filling out the sorry's. This is supposed to reduce the number of lines that humans have to read to only a handful, while the full valid AI-generated proof is tens of thousands of lines. The challenge was just solved a couple weeks, here's the discussion about it: https://leanprover.zulipchat.com/#narrow/channel/583336-Auto...

link