| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Doohickey-d 129 days ago

Openai has a "flex" processing tier, which works like the normal API, but where you accept higher latency and higher error rates, in exchange for 50% off (same as batch pricing). It also supports prompt caching for further savings.

For me, it works quite well for low-priority things, without the hassle of using the batch API. Usually the added latency is just a few seconds extra, so it would still work in an agent loop (and you can retry requests that fail at the "normal" priority tier.)

https://developers.openai.com/api/docs/guides/flex-processin...

1 comments

zozbot234 129 days ago

That's interesting but it's a beta feature so it could go away at any time. Also not available for Codex agentic models (or Pro models for that matter).

link