| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by oli5679 624 days ago

Gp1-o1 preview solves this puzzle correctly in 13 seconds and has a thorough logical deduction in the comments and explanation.

I think it’s a bit unfair on llm to ask it to retrieve the puzzle definition from its training data. I posted the info on the puzzle from his notebook.

https://chatgpt.com/share/670103ae-1c18-8011-8068-dd21793727...

3 comments

lagmg05 624 days ago

The question is if it solved the puzzle correctly before Norvig's article appeared. It could have been trained (I am told that existing models can be modified and augmented in any Llama discussion) on the article or on HN comments.

There could even be an added routine that special cases trick questions and high profile criticisms.

link

Fripplebubby 624 days ago

While this is technically possible, it is not remotely practical and the downside risk of pushing out a borked model is much higher than the upside.

Training the model is expensive (obviously), but even if you are only training it slightly, running evaluations to determine whether the particular training checkpoint is at or above the quality bar is expensive, too.

link

Terretta 624 days ago

> The question is if it solved the puzzle correctly before Norvig's article appeared. It could have been trained...

This caught me by surprise — is there a suggestion or evidence that despite the "knowledge cutoff" OpenAI is continuously retraining GPT-4o's chat-backing model(s) on day over day updates to the web?

link

oli5679 624 days ago

Sure,

I guess the best way to test this is to compose a new question, of a similar format.

link

johnisgood 623 days ago

I am not sure "of a similar format" suffices here, it should not have any resemblance or similarity to this new question or riddle.

link

godelski 624 days ago

The question is to get it to write generic code

link

kenjackson 624 days ago

Disappointing that Norvig didn’t use the model that OpenAI states is their best model for programming.

Also using himself as the programmer seemed like a convenient choice. I’d much rather see him grab a random professional programmer for the task.

link

drhouse_md 622 days ago

gpt-o1 was released Sept. 12th and Norvig ran his tests Sept 25th... I don't understand how Norvig didn't think to test gpt-o1, it actually irritates me lol

link

RevEng 621 days ago

Not everybody follows GPT releases so closely. I work implementing software using LLMs and this is the first I've heard of this.

link