Hacker News new | ask | show | jobs
by MallocVoidstar 68 days ago
No discussion on problem difficulty, or on result quality besides "the Edgee run generated slightly more output tokens than the baseline".
2 comments

More info in the GitHub repo, in the reports folder (sorry, I'm not sure I can add the link here without being flagged).

"Codex + Edgee consumes roughly half the fresh tokens of the normal Codex baseline. Output tokens are marginally higher (+3,312, +19.5%), suggesting the Edgee scenario produces slightly more verbose responses but dramatically reduces context ingestion."

I think the problem being given to Codex for the benchmark is the one in the attached video, where two Codex run side-by-side, working a "standard" dev thingy