Hacker News new | ask | show | jobs
by oli5679 624 days ago
Gp1-o1 preview solves this puzzle correctly in 13 seconds and has a thorough logical deduction in the comments and explanation.

I think it’s a bit unfair on llm to ask it to retrieve the puzzle definition from its training data. I posted the info on the puzzle from his notebook.

https://chatgpt.com/share/670103ae-1c18-8011-8068-dd21793727...

3 comments

The question is if it solved the puzzle correctly before Norvig's article appeared. It could have been trained (I am told that existing models can be modified and augmented in any Llama discussion) on the article or on HN comments.

There could even be an added routine that special cases trick questions and high profile criticisms.

While this is technically possible, it is not remotely practical and the downside risk of pushing out a borked model is much higher than the upside.

Training the model is expensive (obviously), but even if you are only training it slightly, running evaluations to determine whether the particular training checkpoint is at or above the quality bar is expensive, too.

> The question is if it solved the puzzle correctly before Norvig's article appeared. It could have been trained...

This caught me by surprise — is there a suggestion or evidence that despite the "knowledge cutoff" OpenAI is continuously retraining GPT-4o's chat-backing model(s) on day over day updates to the web?

Sure,

I guess the best way to test this is to compose a new question, of a similar format.

I am not sure "of a similar format" suffices here, it should not have any resemblance or similarity to this new question or riddle.
The question is to get it to write generic code
Disappointing that Norvig didn’t use the model that OpenAI states is their best model for programming.

Also using himself as the programmer seemed like a convenient choice. I’d much rather see him grab a random professional programmer for the task.

gpt-o1 was released Sept. 12th and Norvig ran his tests Sept 25th... I don't understand how Norvig didn't think to test gpt-o1, it actually irritates me lol
Not everybody follows GPT releases so closely. I work implementing software using LLMs and this is the first I've heard of this.