| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pton_xd 85 days ago
	"in this paper we primarily evaluate the LLM itself without external tool calls." Maybe this is a factor?

1 comments

No tools were used.

IIRC, web chat often uses tools / code without surfacing this information in any obvious way.