Hacker News new | ask | show | jobs
by andrepd 592 days ago
GP is trying to test the ability of LLMs to perform mathematical tasks, not their ability to store geeks4geeks pages.
1 comments

Not sure why you're being downvoted that is exactly why I'm using that simple problem to benchmark LLMs. If an LLM can't figure out how to traverse a graph in its working memory then it has no hope of figuring out how to structure a proof.

Under natural deduction all proofs are sub trees of the graph which is induced by the inference rules from the premise. Right now LLMs can't even do a linear proof if it gets too long when given all the induced vertices.