Hacker News new | ask | show | jobs
by xilinx_guy 1233 days ago
We obviously need a new test. The new benchmark for large language models should be "Truth" with a numeric score defined as -Log( Percentage_of_Lies_Told ). This way, a perfectly truthful model will have a numeric score of +infinity.