Hacker News new | ask | show | jobs
by defytonofficial 11 hours ago
This matches my experience. I've been using OpenRouter with GPT-4o for an image verification service, and the prompt engineering choices have a measurable impact on cost.

One thing I found: asking the model to respond in structured JSON (with a strict schema) vs free-form text cuts token output by ~40% on average. The model stops "explaining itself" and just gives you the answer.

Also noticed that including a reference image in vision calls roughly doubles the input cost but improves accuracy enough that you save on retries. Net cost ended up lower for my use case.

Curious if you've measured the difference between asking for "concise" output vs actually constraining the response format.

2 comments

That's an excellent idea I plan to try, thanks—re using structured JSON with schema. The most success I've had is saying "be brief" or an explicit size, like one line, or do not explain, etc. I haven't measured other instructions so extensively. They do work but the more specific the better. Other strategies around outputs that are more natural language seem to be hands-down the direction to take, and get away from the machine language habits we've used in the past. It's super interesting seeing this new practice emerging and more or less inventing parts of it along the way. Right now I'm at the place where my brute force and elaborate explanations were reaching their limit and in the frustration just realized I need to take a few days and try to figure out the tool. Across all these the pattern seems entirely that the constraints bound the probability space, whether it's the format like you suggested, or the instruction we give, including the space we point it toward (Web APIs, runtime, schema, etc). In all instances where it's not working the solution seems to be what does the pattern reduce to, and what specifics are the do/don't to go with that, and most of the time the results improve immediately. Your tip seems excellent for this. An easy-button.
why still use gpt-4o?