|
|
|
|
|
by AtlasBarfed
306 days ago
|
|
My personal test question keeps bombing, and I think it's something they should be capable of doing? Are those math contests? Are their questions and answers in the training set? Let's say that these things really won a math Olympiad by thinking. Ok, I would like it to to write parsers based on a well defined expression or language spec. Not as bad as near unparseable C++ or JavaScript. The AIs refuse, despite the prompt, to write a complete parser, hallucinate tests, do things like just call the already working compiler on the CLI, force repetitive reprompts that still won't complete the task. To me, this is a good example of a task I would give AI as a service to see if it will reliably do something that's well specified, moderately annoying, and is most definitely in the training set if they are pulling data from "the internet". |
|
The problem is that "they" isn't a monolith. How much compute went into your tests? Gpt-5 thinking in ChatGPT Plus uses less compute than Gpt-5 thinking in ChatGPT Pro, which uses less compute than the "high" reasoning effort when "gpt-5" is called via the API, which uses less compute than Gpt-5 Pro in ChatGPT Pro, which uses less compute than custom scaffolds, which uses less compute than what went into the IMO/IOI solutions. This is not just my idle speculation, it's publicly available information.