Hacker News new | ask | show | jobs
by SchemaLoad 81 days ago
Once the model has seen the questions and answers in the training stage, the questions are worthless. Only a test using previously unseen questions has merit.
1 comments

They aren't training new models for this. This is an agent harness for Opus 4.6.
All traffic is monitored, all signal sources are eventually incorporated into the training set in one way or another. The person you're responding to is correct, even a single API call to any AI provider is sufficient to discount future results from the same provider.
ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.
Yes, assuming the checkpoint was before the announcement & public availability of the test set.
You live in a conspiracy world. Those AI providers don't update the models that fast. You can try ask them solve ARC-AGI-3 without harness and see them struggle as yesterday yourself.
Which part is the conspiracy? Be as concrete as possible.