|
|
|
|
|
by zamadatix
459 days ago
|
|
Sorry, I probably phrased the question poorly. My question is more along the lines of "when you already scored e.g. OpenAI's o3 on ARC AGI 2 how did you guarantee OpenAI can't just look at its server logs to see question set 4"? |
|
1. We had a no-data retention agreement with them. We were assured by the highest level of their company + security division that the box our test was run on would be wiped after testing
2. We only tested o3 against the semi-private set. We didn't test it with the private eval.