Hacker News new | ask | show | jobs
by revel 408 days ago
They used RFT and there's only so many benchmarks out there, so I would be very surprised if they didn't train on the tests.