| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by njndtu 684 days ago

When I read that line too I was very confused lol. I interpreted it as them saying they basically took other contestant submissions and allowing the model to see these "solutions" as part of context? and then having the model generate its own "solution" to be used for the benchmark. I fail to see how this is "solving" a ioi level question.

What is interesting is the following paragraph in the post " With a relaxed submission constraint, we found that model performance improved significantly. When allowed 10,000 submissions per problem, the model achieved a score of 362.14 – above the gold medal threshold – even without any test-time selection strategy. " So they didn't allow sampling from other contest solutions here? If that is the case quite interesting, since the model is effectively imo able to brute force questions. Provided you have some form of a validator able to tell it to halt.

I came across one of the ioi questions this year that I had trouble solving (I am pretty noob tho) which made me curious about how these reported results were reflected. The question at hand being https://github.com/ioi-2024/tasks/blob/main/day2/hieroglyphs... Apparently, the model was able to get it partially correct. https://x.com/markchen90/status/1834358725676572777