|
|
|
Ask HN: What is "response-level error rate" and how is it measured?
|
|
2 points
by myyke
319 days ago
|
|
There's this chart around gpt-5's hallu and error rates: https://api.wandb.ai/files/byyoung3/images/projects/37269171/0da61431.png from: https://wandb.ai/byyoung3/ml-news/reports/GPT-5-Benchmark-Scores---VmlldzoxMzkwMTYyMg I'm wondering what "response-level error rate" is exactly and it is measured? gpt 4.1 says it's sampled production prompts, rated by humans. Is that it? |
|