Hacker News new | ask | show | jobs
by VladVladikoff 261 days ago
Interesting. I have in the past tried to get bounding boxes of property boundaries on satellite maps estimated by VLLM models but had no success. Do you have any tips on how to improve the results?
4 comments

With Qwen I went as stupid as I could: please provide the bounding box metadata for pytesseract for the above image.

And it spat it out.

It’s funny that many of us say please. I don’t think it impacts the output, but it also feels wrong without it sometimes.
Depends on the model, but e.g. [1] found many models perform better if you are more polite. Though interestingly being rude can also sometimes improve performance at the cost of higher bias

Intuitively it makes sense. The best sources tend to be either of moderately high politeness (professional language) or 4chan-like (rude, biased but honest)

1: https://arxiv.org/pdf/2402.14531

When I want an LLM to be be brief, I will say things like "be brief", "don't ramble", etc.

When that fails, "shut the fuck up" always seems to do the trick.

I ripped into cursor today. It didn't change anything but I felt better lmao
Bevore GPT5 was released I already had the feeling like the webui response was declining and I started to try to get more out of the responses and dissing it and saying how useless their response was did actually improve the output (I think).
The way I think of it, talking to an LLM is a bit like talking to myself or listening to an echo, since what I get back depends only on what I put in. If it senses that I'm frustrated, it will be inclined to make even more stuff up in an attempt to appease me, so that gets me nowhere.

I've found it more useful to keep it polite and "professional" and restart the conversation if we've begun going around in circles.

And besides, if I make a habit of behaving badly with LLMs, there's a good chance that I'll do it without thinking at some point and get in trouble.

It's a good habit to build now in case AGI actually happens out of the blue.
Gemini has purpose post training for bounding boxes if you haven't tried it.

The latest update on Gemini live does real time bounding boxes on objects it's talking about, it's pretty neat.

shameless plug here for AMD's AI Dev Day - registration is open and they want feedback on what to focus on: https://www.amd.com/en/corporate/events/amd-ai-dev-day.html
Do you have some example images and the prompt you tried?
also documented stack setup if could.