Hacker News new | ask | show | jobs
by thelastparadise 637 days ago
Can someone explain simply how these benchmarks work?

What exactly is a "failure rate" and how is it computed?

1 comments

They simply ask the AI a question about a large document (or set of docs). It either gets the answer right or wrong. They count the number of hits and misses.