| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MallocVoidstar 68 days ago
	No discussion on problem difficulty, or on result quality besides "the Edgee run generated slightly more output tokens than the baseline".

2 comments

sachamorard 68 days ago

More info in the GitHub repo, in the reports folder (sorry, I'm not sure I can add the link here without being flagged).

"Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."

link

kokakiwi 68 days ago

I think the problem being given to Codex for the benchmark is the one in the attached video, where two Codex run side-by-side, working a "standard" dev thingy

link