Hacker News new | ask | show | jobs
by acuozzo 261 days ago
I was provided with a battery of externally-produced tests, benchmark scripts, etc. I was told to assume that the tests were comprehensive.

Independent of this, I used competing models produced by different organizations (e.g. OpenAI vs. Google) to test & verify each other's work.

I also could, somewhat, follow along with the math itself.