| i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan. heres some napkin math gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions. 1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu) 2. you are getting 75% prompt cache reads Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens) Input: ~633mil ~475 mil cached at 50% input pricing = ~$9.25 ~158 mil uncached = ~$6.15 tokensOutput: 25mil tokens ($4.5) This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack. its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin. |
I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.