|
Throwing a feature request in here just in case someone from OpenAI sees it. I'd really like it if the streaming versions of their APIs could return a token usage count at the end. The non-streaming APIs do this right now: curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" -d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "A short fun fact about pigeons"
}
]
}'
Returns: {
"id": "chatcmpl-92UiIWQaf442wq7Eyp7kF8ge0e3fE",
"object": "chat.completion",
"created": 1710381746,
"model": "gpt-3.5-turbo-0125",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Pigeons are one of the few bird species that can drink water by sucking it up through their beaks, rather than tilting their heads back to swallow."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 14,
"completion_tokens": 33,
"total_tokens": 47
},
"system_fingerprint": "fp_4f0b692a78"
}
Note the "usage" block there telling me how many tokens were used (which tells me how much this cost).But if I add "stream": true I get back an SSE stream that looks like this: ...
data: {"id":"chatcmpl-92Uk81oNjrcUJQnPX8fSNqFINLfSI","object":"chat.completion.chunk","created":1710381860,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"."},"logprobs":null,"finish_reason":null}]}
data: {"id":"chatcmpl-92Uk81oNjrcUJQnPX8fSNqFINLfSI","object":"chat.completion.chunk","created":1710381860,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
data: [DONE]
There's no "usage" block, which means I have to try and account for the tokens myself. This is really inconvenient!I noticed the other day that the Claude streaming API returns a "usage" block with the last message. I'd love it if OpenAI's API did the same thing. I need this right now because I'm starting to build features for end users of my own software, and I want to be able to give them X,000 tokens "free" before starting to charge them for extras. Counting those tokens myself (probably using tiktoken) is code I'd rather not have to write - especially since features like tools/functions or images make counting tokens a lot less obvious. |