| HN Mirror

Agreed! FWIW I am attempting to create an open-source wiki/watchdog eval platform -- weval.org -- , so we can all keep an eye on LLMs, their biases, and their general competencies without relyong in the AI providers marking their own homework. I really believe this needs to exist to express our needs and hold model creators to account. Especially as model drift and manipulation becomes a risk.