| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fgfm 916 days ago

We'll do our best to consistently report it since this can indeed influence the financial decisions of developers, especially if they go through third-party paying LLM APIs. In our early experiments, we've seen about 200-250 tokens per request (~= autocompletion), of which about 40-50 tokens are generated.

Two things we're doing this:

- right now our API response contains more than what's required for autocompletion, so there is room for improvement there. And since we focus on team alignment, the goal is to boost the suggestion acceptance rate compared to alternatives. So in the end, fewer calls and lower token consumption.

- since we're working on fully migrating to hostable OSS models of reasonable size, the financial aspect of token consumption should be mostly moved out of the picture to focus on latency.