| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ACCount37 208 days ago

That looks like something that can be solved by autoregressive models of today, no architectural changes needed.

What you need is: good image understanding, at least GPT-5 tier, general purpose reasoning over images training, and then some domain-specific training, or at least some few-shot guidance to get it to adopt the correct reasoning patterns.

If I had to guess which model would be able to do it best out of the box, few-shot, I'd say Gemini 3 Pro.

There is nothing preventing an autoregressive LLM from revisiting images and rewriting the texts as new clues come in. This is how they can solve puzzles like sudoku.

1 comments

vintermann 208 days ago

Try for yourself, if you want to:

https://urn.digitalarkivet.no/URN:NBN:no-a1450-rg60085808000...

link