Hacker News new | ask | show | jobs
by huac 957 days ago
I think this is one of the most important possible works for open source LLM's, really glad y'all pushed this forward!

That's not hyperbole. Why is OpenAI able to charge so little for their API's? I have heard rival mega LLM company CEO's complain that OpenAI's prices would be a loss for their rivals. But I think it's still positive margin, and that they can charge low prices for API because they've invested more into managing the infra, sure, but most importantly because they have the best utilization of their existing hardware.

If it costs everyone $X/gpu/hr to serve models, the company that has the most throughput wins on price. In a world without finetunes, the most capable model, the one that can zero- or few-shot the most tasks will have the most usage. Finetuned open models can reach parity with GPT on narrow tasks, but until now, having public providers serve the models was expensive. Your private finetune is only going to be queried by you, not everyone, so it's super expensive to serve on a per token level. With hot swappable LoRA adapters, that calculus changes, and the cost per token can go way down. Super, super exciting!

2 comments

Doesn’t OpenAI still operate at significant losses by using massive infusions of capital from Microsoft and other investors? If you are giving away half your product, it’s not surprising that they would be undercutting competition. Not a new strategy.

Underprice to avoid or drive out competition and encourage lock-in, then increase prices when you no longer have competitors or your user base is large enough and reliant enough that your attrition is manageable. Then you sell to a bigger company who grinds it up and integrates into their own products. Same as always. Bonus points if you claim to be open source for the free marketing and/or free development/testing in the form of user contributions before switching to a proprietary model.

Shouldn’t we have a standardized corporate strategy bingo card by now?

I don't have any access to their financials, so this is speculative, but while they do 'give away' GPT-3.5-turbo in the free ChatGPT, the rest of the business is likely extremely profitable. If you want to estimate cost of serving those free requests, consider how much it costs to do that via API. A 10 message conversation, where ChatGPT outputs 200 tokens each time, is $0.002, or two-tenths of a cent. I believe their API usage to still be positive margin for them. (Of course, now consider how much markup there is in ChatGPT pro!!)

There is a difference between pricing aggressively and pricing at a loss. Their pricing for gpt-3.5-turbo now matches leading public providers for Llama-70B ($1/million tokens). Rumors are that 3.5-turbo is actually a 20B model, but even let's assume that it is larger than 70B: OpenAI can still price more aggressively than Llama-70B providers because they have better throughput and utilization of the same hardware.

Interesting. I'm not so sure I really 'got' that part of finetunes / LoRA adapters before reading this comment. Makes me want to make one to take it for a spin, see what comes out the other side.
the nice thing too is that because you are freezing almost all the parameters, and generally in lower precision (eg QLoRA loads the full model in 4-bit), it's super low gpu memory usage. a free Colab will suffice for finetuning a 7b definitely, renting a 3090 is less than 50 cents an hour, pretty low barrier to entry to try something!