|
|
|
|
|
by wild_egg
275 days ago
|
|
The thing I'm stuck on is that, yes, the proof is correct. But what guarantees do I have that the proof or theorem is actually representative of the program I'm trying to specify? And beyond that, what guarantees are there that the resulting end program actually adheres to either the specs or what has been proven? Like it seems the goal here is `English -> Lean -> Code in whatever language` instead of just `English -> Code` but there's no way to be certain the translation step on either side of Lean has actually been done correctly. I've done a lot of experiments around this so far this year and theorem provers can certainly help provide LLMs with more context but there is no tooling to actually verify that the resulting Python fully covers the corresponding Lean that was fed in. |
|
The first problem is just hard, the second can be approached by using a language like F* or Dafny that compiles to executable code. IIRC Amazon has had some success by using Lean for the specification and using fuzzing to test that the implementation matches.