| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simonw 210 days ago

For covering the risk of mistakes I suggest considering ways of "visually quoting" the documents.

If the summary says "closing timeline: X" but there's an icon I can click that pops open an overlay with a visual cropped screenshot of that part of the original PDF - maybe even with a red circle around that detail - I can trust those summaries a whole lot more.

Gemini 2.5 has image bounding box and masking features that can help with this (sadly missing from Gemini 3.)

2 comments

lysecret 210 days ago

Oh I didn’t know about the visual bounding boxes this is super cool!

Quick question are you talking about this feature?

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/b...

Because it’s just using structured response so it should be doable with Gemini 3 ? (We are using Gemini 3 for some docs processing and its visual understanding is just incredible)

link

simonw 210 days ago

No I'm talking about the image segmentation feature: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...

But the bounding box stuff might work well enough in Gemini 3 to handle this case as well.

link

lysecret 210 days ago

Hmm so that post also links back to segmentation done by structured outputs? (Though here not even enforcing the structure)

https://ai.google.dev/gemini-api/docs/image-understanding#se...

link

simonw 210 days ago

It's not supported by Gemini 3: https://ai.google.dev/gemini-api/docs/gemini-3#migrating_fro...

> Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.

link

beechwood 210 days ago

Ok, gotcha. I think this is doable. Show the excerpt from the original document so the user has confidence the data is correct.

Thank you for the feedback.

link