Hacker News new | ask | show | jobs
by simonw 833 days ago
Throwing a feature request in here just in case someone from OpenAI sees it.

I'd really like it if the streaming versions of their APIs could return a token usage count at the end.

The non-streaming APIs do this right now:

    curl https://api.openai.com/v1/chat/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" -d '{
        "model": "gpt-3.5-turbo",
        "messages": [
          {
            "role": "user",
            "content": "A short fun fact about pigeons"
          }
        ]
      }'
Returns:

    {
      "id": "chatcmpl-92UiIWQaf442wq7Eyp7kF8ge0e3fE",
      "object": "chat.completion",
      "created": 1710381746,
      "model": "gpt-3.5-turbo-0125",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Pigeons are one of the few bird species that can drink water by sucking it up through their beaks, rather than tilting their heads back to swallow."
          },
          "logprobs": null,
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 14,
        "completion_tokens": 33,
        "total_tokens": 47
      },
      "system_fingerprint": "fp_4f0b692a78"
    }
Note the "usage" block there telling me how many tokens were used (which tells me how much this cost).

But if I add "stream": true I get back an SSE stream that looks like this:

    ...
    data: {"id":"chatcmpl-92Uk81oNjrcUJQnPX8fSNqFINLfSI","object":"chat.completion.chunk","created":1710381860,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{"content":"."},"logprobs":null,"finish_reason":null}]}
    
    data: {"id":"chatcmpl-92Uk81oNjrcUJQnPX8fSNqFINLfSI","object":"chat.completion.chunk","created":1710381860,"model":"gpt-3.5-turbo-0125","system_fingerprint":"fp_4f0b692a78","choices":[{"index":0,"delta":{},"logprobs":null,"finish_reason":"stop"}]}
    
    data: [DONE]
There's no "usage" block, which means I have to try and account for the tokens myself. This is really inconvenient!

I noticed the other day that the Claude streaming API returns a "usage" block with the last message. I'd love it if OpenAI's API did the same thing.

I need this right now because I'm starting to build features for end users of my own software, and I want to be able to give them X,000 tokens "free" before starting to charge them for extras. Counting those tokens myself (probably using tiktoken) is code I'd rather not have to write - especially since features like tools/functions or images make counting tokens a lot less obvious.

2 comments

We do the token counting on our end literally just running tiktoken on the content chunks (although I think usually its one token per chunk). Its a bit annoying and I too expected they'd have the usage block but its one line of code if you already have tiktoken available. I've found the accounting on my side lines up well with what we see on our usage dashboard.
As an FYI, this is fine for rough usage, but it's not accurate. The OpenAI APIs inject various tokens you are unaware of into the input for things like function calling.
This and/or being able to fetch the responses with their token usage by id. What is that ID for without a way to retrieve the completions with it?