Hacker News new | ask | show | jobs
by andai 201 days ago
Yeah, that's a great point.

ArtificialAnalysis has a "intelligence per token" metric on which all of Anthropic's models are outliers.

For some reason, they need way less output tokens than everyone else's models to pass the benchmarks.

(There are of course many issues with benchmarks, but I thought that was really interesting.)