| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SchemaLoad 81 days ago
	Once the model has seen the questions and answers in the training stage, the questions are worthless. Only a test using previously unseen questions has merit.

1 comments

lambda 81 days ago

They aren't training new models for this. This is an agent harness for Opus 4.6.

link

measurablefunc 81 days ago

All traffic is monitored, all signal sources are eventually incorporated into the training set in one way or another. The person you're responding to is correct, even a single API call to any AI provider is sufficient to discount future results from the same provider.

link

stale2002 81 days ago

ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.

link

measurablefunc 81 days ago

Yes, assuming the checkpoint was before the announcement & public availability of the test set.

link

raincole 81 days ago

You live in a conspiracy world. Those AI providers don't update the models that fast. You can try ask them solve ARC-AGI-3 without harness and see them struggle as yesterday yourself.

link

measurablefunc 80 days ago

Which part is the conspiracy? Be as concrete as possible.

link