|
|
|
|
|
by keeda
200 days ago
|
|
I dunno... Consider: 1. Token prices keep plummeting even as models are getting stronger. 2. Most models are being offered for free at a significant loss, so reducing costs would be critical to maintain some path to sustainability. 3. Every hyperscaler has been consistently saying for the past several quarters that they are severely constrained on capacity and in fact have billions in booked backlogs. That is, if they had more capacity they would actually be making even more billions. I can totally imagine the smaller players renting these cloud resources for their private model uses to be rather inefficient (which is where the 50% utilization number comes from), probably because they are prioritizing time-to-market over other aspects. But I would wager that resource efficiency, at least for inference, is absolutely a top priority for all the big players. |
|