| Betteridge's Law of Headlines strikes again. (Well, Hacker News' abbreviated headlines, in this case.) "Professors Staffed a Fake Company with AI Agents. Guess What Happened?"
"No." The original headline is "Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened"; the answer is... uh... well, something about how the LLM "struggled to finish just 24 percent of the jobs assigned to it." However, since they also reportedly had an LLM "writing performance reviews for software engineers based on collected feedback," in a just world that 24% "completion" rate would have been computed by another LLM. Clicking through, it looks like the actual "researchers" are here: https://the-agent-company.com/ And their project is here: https://github.com/TheAgentCompany/TheAgentCompany/blob/main... Which (at first glance) looks like a plain old task-based benchmark, i.e. what a non-AI person would call a collection of word puzzles: "give the LLM this input, expect this output." These word puzzles are themed around office jobs. Here's an example input: https://github.com/TheAgentCompany/TheAgentCompany/blob/main... |