| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andy99 787 days ago
	Not sure I understand your example? It's not an offensiveness benchmark, in fact I can imagine a model trained to be inoffensive would do worse on a truth benchmark. I wouldn't go so far as to say truthfulQA is actually testing how truthful a model is or its reasoning. But it's one of the least correlated with other benchmarks which makes it one of the most interesting. Much more so than running most other tests that are highly correlated with MMLU performance. https://twitter.com/gblazex/status/1746295870792847562