Hacker News new | ask | show | jobs
by xyzzy123 865 days ago
Ok I guess the team with the largest LLM workload in the world and billions in funding won't understand how to optimise a chip for the exact workload they have and near future ones.
5 comments

Exactly. Present success means the ability to forecast what’s needed for future success — see the Pierce-Arrow Motor Car Company and their dominance in the market to this very day
This person is not saying success -> more success. I think they’re just pointing out that Altman is smart and is surrounded by smart people and a company that understands the demand because they make up the majority of the demand (and they have a strong thesis).
Is he raising for OpenAI or for another venture? If he is using deep knowledge from OpenAI to raise money for another venture, this sounds wrong.
He is rich and powerful, of course it isn’t wrong

/s

Or broke and powerful? Because of spending a fortune on WorldCoin, working at a nonprofit and heavily investing into early AI startups?
No way OpenAI makes up even a plurality of chip demand
OpenAI not itself, but Microsoft is.

For 2022 and 2023, Microsoft bought a significant portion of NVIDIA's available hardware. They spent quite a bit of 2023 trying to figure out how to even power the multiple fleets of GPUs. Just now with the mild to expected wild adoption of Azure OpenAI are they getting around to servicing all their (potential) customers.

[citation needed]

Seriously, this is am outlandish claim just from looking at Microsoft and Nvidias market cap.

I am sure that Microsoft is gonna be one of Nvidias largest customers, but I sincerely doubt it's even a double digit percentage of their revenue.

All of this is public information, its estimated Microsoft bought ~150k H100s from old reports, we also know today that Meta actually bought 500k units

To reach double digit revenue of NVIDIA's 2023 at $26.97 billion, you'd only need to hit ~$2.7B in sales.

H100's are priced anywhere between $20k - $35k, so required to purchase ~77k - ~135k units.

That is singularly H100s, Microsoft also offers lower compute, and they have the rest of Azure to service with a variety of solutions.

Being at #1 or #2 market cap worldwide is not a farfetched position to be a significant controller of chips, especially since they directly work in the space as a platform.

This ignores Google's in house chips and their internal usage. They've been at this much longer. I doubt we have the visibility to know how they compare in terms of available flops and the unit costs
> They've been at this much longer.

.. but is that true?

MSR has been putting out research in all derivatives of modern large neural network architectures (NLP, CV, etc.) for the same amount of time that Google has. If there was a drift between timelines, its not large IMO.

What you could argue is that Google historically was more successful in their research outputs.

However, historical consumption of resources may not compare to current resources consumption.

> I doubt we have the visibility to know how they compare in terms of available flops and the unit costs

Completely agreed, unfortunately, this is all guesswork at best

Perhaps. I have no idea and am not purporting to know.
Can you elaborate on this? Per ChatGPT:

> Using Pierce-Arrow Motor Car Company as an example of such success is historically inaccurate. Pierce-Arrow was an American automobile manufacturer based in Buffalo, New York, which was known for producing luxury cars. It was indeed a dominant and prestigious brand in the early 20th century. However, the company did not manage to maintain its success and ultimately failed to adapt to changing market conditions. It faced financial difficulties during the Great Depression and eventually went bankrupt in 1938. Pierce-Arrow's inability to forecast and adapt to the economic changes and shifts in consumer preferences of the time led to its decline.

From the very answer ChatGPT gave you, it's evident that GP is saying that current success does not imply future success, using that company as an example. What needs elaboration?
It's pretty clear he is trying to make the opposite point, see "dominance in this market to this very day"
While vertical integration is a great boon for a company, it's hard to pull off. Being an expert in industry X doesn't mean you'll do great in industry Y, even if they are complementary.

Training and designing LLMs doesn't mean you understand the semiconductors business.

Vertical Integration? It may not be an OpenAI project, going by the reporting when he was ousted. I wont be surprised if the plans are for a Muskian incestuous/I-swear-its-not-self-dealing setup, wirh Altman being the CEO of both entities
Correct. They're an LLM team, not chip designers.
Yeah, it's not even like they're running the datacenters where the training and tuning are happening. I would hope some of the people understand what current compute requirements are and perhaps they know better than most what future requirements will be. However, MS has been doing most of the backend for OpenAI and they've been in discussions with actual silicon architecture people (not just NVidia), but those are the folks who would do any implementation.

Perhaps they'll pull off an Apple (for ARM) and do their own architecture (either for training/tuning or inference) that will have a significant effect on the industry, but it seems unlikely. They haven't hired the right people.

The real advantage they might have is insight into how the algorithms can be adapted to reduce power consumption/latency while improving performance. It would seem odd to me, if there weren't more than an order of magnitude in new algorithms for LLMs. You're not going to get 10x the transistors or speed from silicon, but you might get an efficient architecture for a significant algorithmic improvement (that might not just be CUDA).

"I know how machine learning and statistical computing works, therefore I am an expert in hardware design" fallacy.
> "I know how machine learning and statistical computing works, therefore I am an expert in hardware design" fallacy.

A typical case of engineer's disease.

I am guessing an incredibly talented team that is incredibly networked and incredibly well funded and proven agile in the tech hub of the world can find hardware experts. Don’t know why anyone would bet against that.
We would have heard if they had hired/bought the size of team necessary to design a system large enough to be a significant impact. Modern (eve sub 28nm much less 2nm) design is hugely complex and the range of things that an AI compute engine needs to do are very broad.

Perhaps they could design a core and license it out? I'm trying to come up with a way they can do something significant without 100 people. Just the memory and serial connections are complex enough ignoring the GPU or heat/power issues.

It took apple like 10 years to go from their first chips to actually using them in laptops, and they are literally the most well capitalized company on the planet. Sorry if I'm skeptical that some relative up starts with a billion in compute from Microsoft can compete with trillion dollar companies that have been around for decades.
Nobody can even define what AI is, why we need it, or how to achieve it. Usually it makes sense to seek funding to execute on a plan. Making a fancy chat bot that scrapes the web to synthesize sometimes accurate and sometimes useful information is not worth trillions of dollars.

What is essentially happening in my opinion is technical innovation has slowed so silicon valley is seeking money to prop up a house of cards that doesn't make much new that is useful or needed.

Can anyone specifically say what trillions of dollars invested in "AI" would buy for society?

It seems to me there are so many higher priorities.

I wouldn't bet against it but that approach has a remarkably low rate of success. We hear about the winners - survivorship bias is real.
How about something along the lines of AWS and their Graviton?
Graviton - you mean the poorly performing solution that only has a space in the market because amazon sells it as a subsidized cost as part of a larger effort to put pricing pressure on amd/intel? That Graviton?
Was Google a chip designer before the first TPU?
Yes. Google had a number of chip products before that. Some made it to A1 and worked. Just cause they don’t advertise it doesn’t make it not so.
> Yes. Google had a number of chip products before that.

Is that true? I can't find anything suggesting it is. In fact, the little I can find suggests you are incorrect. I'll link them for the sake of referencing sources but they're both pretty awful ad-ridden sites...

A 2016 Tech Radar interview [0] with Norm Jouppi has him quoted as saying:

> [The] Tensor Processing Unit (TPU) is our first custom accelerator ASIC [application-specific integrated circuit] for machine learning [ML], and it fits in the same footprint as a hard drive.

And a 2023 Tom's hardware post [1] begins:

> Google has made significant progress in its endeavor to develop its own data center chips, according to a new report. The Information says that a key milestone has just been reached, which means that Google can plan to roll out server systems powered by the new chips starting from 2025.This is not the first processor that Google has successfully put through R&D - the company has previously made an ASIC for servers and an SoC for mobile devices. The search giant started using its internally developed Tensor Processing Unit (TPU) as far back as 2015.

[0]: https://www.techradar.com/news/computing-components/processo...

[1]: https://www.tomshardware.com/news/google-reaches-self-develo...

I guess it depends on what you are defining as a chip and what you are defining as "Google" -- as in if they have contractors design/build to their needs does that count.

1/ https://www.wired.com/2012/03/google-microsoft-network-gear/

2/ I believe they had a few custom chips designed for the youtube workloads that predate the TPU.

I remember in 2010 there was a building in MV that focused on custom chips.

Said the horse factory when automobiles were being built.
I don't remember LLM's claiming to replace GPU's. This is more like arguing with a landowner why your assembly line is so innovative and needs to be built on their land for free. They need the land, the land doesn't necessarily need them yet.
Pullman Company will disagree with you.
Absolutely terrible analogy.
A LLM might "believe" that horses are built in factories.
It makes sense to ASIC-ify the thing to get lower latencies and make the whole thing cheaper, so MS can run GPT-(n+1) cheaper. But this bet only pays off if the LLM industry gets into the mature stage where costs dominate, not innovation.
The workload they have is already optimized for something like an Nvidia GPU.
I apologise if my response was a little snarky.

Even granted that OpenAI are not able to build a chip that is competitive with NVidia's latest GPUs for running LLMs right away (which is an opinion - not backed by any direct evidence, but I agree that it is plausible as they are going up against a lot of prior R&D) is it not possible that:

a) The unit economics could be so much better that the result is still a major win, e.g. 50% of the performance at 20% of the price.

b) OpenAI is decoupled from existing supply constraints and is able to grow faster and deliver more value. A "worse" chip that you can actually get (in insane volume) may be strategically better than a "superior" chip that is limiting your growth.

c) That the plan might include some elements you are not expecting - at the $trillions investment level they might be looking at doing some surprising things e.g. (I am just making this up but there are a lot of possibilities) buy a memory manufacturer and work directly on increasing memory bandwidth.

From a lay observer point of view of the semiconductor industry of the last two decades, it seems entirely implausible they could do that quickly without just buying a company that was already working on it. And then, unless that company was big enough to already have a significant defensive patent portfolio, it's likely their efforts would be stymied in court for years if it was remotely successful.

The idea that even with expertise, the wins would be so much over what other companies that have hired/bought these companies have been designing for the last 10 years based on very similar requirements (the ones that wrote so much of the foundational research) also seems implausible.

c) It's not actually possible to plan investments at that level with anything more than a very vague direction you're aiming. If it is long term, then everything is changing in unpredictable ways before you get even 25% there, but if you throw so much money at the problem in order to try to solve it much more quickly you are disrupting global economic and geopolitical forces in ways that also can't be planned for.

"50% of the performance at 20% of the price" is wildly implausible even if you can somehow start fabbing perfect chips for openai's workloads tomorrow. Especially if they don't have access to the fabrication processes that nvidia, amd etc are using, since more modern (read: expensive) processes reduce power draw and enable higher clocks. 80% of nv's datacenter die space is not wasted, not close to that much.

It seems more likely to me they'd get 20% of the performance at 50% of the price, and that might still work out for them if it allows them to scale faster without being bottlenecked on supply of existing GPUs. But there's no magic bullet here.

They also still need to source a bunch of other stuff, like RAM, even if they can source their own processors.

Nobody is able to build a chip that is competitive with NVidia's latest GPUs, not even AMD who would be next in line. Look at Google's TPU for a glimpse at a likely outcome of such an endeavor.

What it tells me is that Altman seems to believe that OpenAI can only make the next step if they can throw even more compute at the problem but that that isn't feasible at today's prices.