| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mhl47 192 days ago
	We are currently working on some christmas puzzle, that are - I would say - a bit more difficult from the visual side. GPT5.1 completely failed at all of them while Gemini 3 solved two till know that I would consider rather impressive. One was two screenshots of a phone screen with chats that are timestamped and it had to take the nth letter of the mth word based on the timestamp. While the type of riddle could be in the training data the ability to OCR this that well and understand the spatial relation to each object perfectly is something I have not seen from other models yet.

1 comments

devttyeu 192 days ago

Visual puzzle solving is a pretty easily trainable problem due to it being simple to verify, so that skill getting really good is just a matter of time