Hacker News new | ask | show | jobs
by ZoomerCretin 1005 days ago
>For example, in a comparison of three NPM modules, it claimed each of the packages was "fastest" in the three different "drafts" it produced.

That's an odd benchmark, no? LLMs aren't omniscient; you shouldn't expect it to be an expert on something so specific, especially if there is not widely available material on the internet that clearly indicates that one choice is correct. They can only repeat what they've been trained on and reason about that information.