| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by acuozzo 261 days ago

I was provided with a battery of externally-produced tests, benchmark scripts, etc. I was told to assume that the tests were comprehensive.

Independent of this, I used competing models produced by different organizations (e.g. OpenAI vs. Google) to test & verify each other's work.

I also could, somewhat, follow along with the math itself.