|
|
|
|
|
by Bjorkbat
819 days ago
|
|
Reminds me of this paper where some researchers had AIs role play as employees at a startup and tasked them with building various forms of software. It was pretty interesting. Managed to build Pong. Thing is though, they neglected to compare this against a control, and the examples they tested this on were examples that GPT had no problem building. No idea if this actually improved performance in LLMs. I think comments like these are worthwhile because, frankly, I can’t trust AI researchers to run good experiments or evaluate their models properly for a variety of reasons. I mean, most scientific papers in general are hard to replicate and have flaws concerning sample size and what have you (Related, I still remember my disillusion in finding out that the average Hacker News commenter was an idiot incapable of critical thinking when the LK-99 hype reached a fever pitch). In any other context we would be deeply suspicious of the results if they were sponsored by a corporate party, yet in the context of AI we don’t seem to care that most AI researchers work for Microsoft. |
|