|
|
|
|
|
by krackers
340 days ago
|
|
interesting, I thought one of the whole points of o3 was mixed multimodal reasoning (e.g. everyone doing those geoguesser challenges). But maybe that's just a parlor trick and it's not actually implemented that way. I wonder when they're going to extend chain-of-thought to work with image tokens, seems like that'd help for solving spatial challenges like this. |
|
I expect that for a much larger images (e.g., 300x300 grids) and for problems simpler than ARC, that o3's image processing would give it a lead over o3 processing a very long character stream.