| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mucle6 740 days ago
	It looks like the vision costs the same for GPT-4o vs mini. Both start with 150x150px and if you click the (i) it says mini uses way more base tokens and way more tile tokens, it still costs the same...

2 comments

MasterScrat 740 days ago

It almost sounds shady... "it's 30x cheaper per token but you now need 30x more tokens per image"?

Has anyone already validated this based on billed cost? running a batch myself to check

EDIT:

Ok so I captioned 500 images in "low resolution" mode with GPT-4o-mini

Each one took approximately: "completion_tokens=84, prompt_tokens=2989, total_tokens=3073"

Reported GPT-4o-mini cost is $0.25

Using GPT-4o this would cost me $1.33 (also in "low resolution" mode), with this breakdown:

"completion_tokens=98, prompt_tokens=239, total_tokens=337"

link

MasterScrat 740 days ago

Ok I now understand better what happened:

The price for using images as part of your prompt has indeed not changed between GPT-4o-mini and GPT-4o

Yet overall, captioning 500 images now costs me 5x less. This is because when I'm captioning an image, I'm providing both an image and a text prompt. The cost of using the image in the prompt stays the same, but the cost of the text dramatically dropped.

link

minimaxir 740 days ago

Good catch: the calculators here are bizarre. For GPT-4o, a 512x512 image uses 170 tile tokens. For GPT-4o mini, a 512x512 image uses 5,667 tile tokens. How does that even work in the context of a ViT? The patches and its image encoder should be the same size/output.

Since the base token counts increase proportionally (which makes even less sense) I have a hunch there's a JavaScript bug instead.

link

bryanh 740 days ago

Confirmed that mini uses ~30x more tokens than base gpt-4o using same image/same prompt: { completionTokens: 46, promptTokens: 14207, totalTokens: 14253 } vs. { completionTokens: 82, promptTokens: 465, totalTokens: 547 }.

link

minimaxir 740 days ago

Huh. I am so confused.

link