|
|
|
|
|
by ACCount37
208 days ago
|
|
That looks like something that can be solved by autoregressive models of today, no architectural changes needed. What you need is: good image understanding, at least GPT-5 tier, general purpose reasoning over images training, and then some domain-specific training, or at least some few-shot guidance to get it to adopt the correct reasoning patterns. If I had to guess which model would be able to do it best out of the box, few-shot, I'd say Gemini 3 Pro. There is nothing preventing an autoregressive LLM from revisiting images and rewriting the texts as new clues come in. This is how they can solve puzzles like sudoku. |
|
https://urn.digitalarkivet.no/URN:NBN:no-a1450-rg60085808000...