| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by falloutx 137 days ago
	Thats also called slowing down default experience so users have to pay more for the fast mode. I think its the first time we are seeing blatant speed ransoms in the LLMs.

2 comments

Aurornis 137 days ago

That's not how this works. LLM serving at scale processes multiple requests in parallel for efficiency. Reduce the parallelism and you can process individual requests faster, but the overall number of tokens processed is lower.

link

falloutx 137 days ago

They can now easily decrease the speed for the normal mode, and then users will have to pay more for fast mode.

link

Aurornis 137 days ago

Do you have any evidence that this is happening? Or is it just a hypothetical threat you're proposing?

These companies aren't operating in a vacuum. Most of their users could change providers quickly if they started degrading their service.

link

falloutx 137 days ago

They have contracts with companies, and those companies wont be able to change quickly. By the time those contracts will come back for renewals it will already be too late, their code becoming completely unreadable by humans. Individual devs can move quickly but companies don't.

link

kolinko 137 days ago

Are you at all familiar with the architecture of systems like theirs?

The reason people don't jump to your conclusion here (and why you get downvoted) is that for anyone familiar with how this is orchestrated on the backend it's obvious that they don't need to do artificial slowdowns.

link

falloutx 137 days ago

I am familiar with the business model. This is clear indication of what their future plan is.

Also, I just pointed out at the business issue, just raising a point which was not raised here. Just want people to be more cautious

link

blackqueeriroh 137 days ago

So you are not familiar with the system architecture. Okay.

link

throw310822 137 days ago

Slowing down respect to what?

link

falloutx 137 days ago

Slowing down with respect to original speed of response. Basically what we used to get few months back and what is the best possible experience.

link

throw310822 137 days ago

There is no "original speed of response". The more resources you pour in, the faster it goes.

link

falloutx 137 days ago

Watch them decrease resources for the normal mode so people are penny pinched into using fast mode.

link

throw310822 137 days ago

Seriously, thinking at the price structure of this (6x the price for 2.5x the speed, if that's correct) it seems to target something like real time applications with very small context. Maybe vocal assistants? I guess that if you're doing development it makes more sense to parallelize over more agents rather than paying that much for a modest increase in speed.

link