Hacker News new | ask | show | jobs
by softmodeling 823 days ago
For additional context:

- Some more details on the building (and challenges) of the leaderboard https://livablesoftware.com/biases-llm-leaderboard/

- The tests used in the backend: https://github.com/SOM-Research/LangBiTe