Hacker News new | ask | show | jobs
by peyton 929 days ago
The appropriations bill example also looks right—the insertion doesn’t stylistically match the rest of the document. I’m much more skeptical of evaluations if this is how the sausage gets made. Feels like bullshit artistry.
1 comments

These are not actual tests they used for themselves.

Some third party did these tests first (in article and spread on social) to which the makers of Claude are responding.

I knew it’s a weird test right when I first encountered it.

Interesting that the Claude team felt like it’s worth responding to.