Hacker News new | ask | show | jobs
by mkl 591 days ago
Did you try asking them to write a program to do it?
1 comments

GP is trying to test the ability of LLMs to perform mathematical tasks, not their ability to store geeks4geeks pages.
Not sure why you're being downvoted that is exactly why I'm using that simple problem to benchmark LLMs. If an LLM can't figure out how to traverse a graph in its working memory then it has no hope of figuring out how to structure a proof.

Under natural deduction all proofs are sub trees of the graph which is induced by the inference rules from the premise. Right now LLMs can't even do a linear proof if it gets too long when given all the induced vertices.