Hacker News new | ask | show | jobs
by adrian_b 206 days ago
Because those who nowadays have money for investing, do not invest them in the research problems whose solutions are urgently needed for the survival of humanity, e.g. for developing technologies for using all substances in closed cycles (like biosphere did before humans), but instead of that they invest all their money in research for the dream of developing AGI, which even if successful will be of benefit only for a small number of humans, not for all mankind.

The fp64 and fp32 performance is needed for physical simulations required by the former goal, while fp16 and fp8 performance is useful only for the latter goal.

So AMD's choice logically follows the choice of those who control the investment money.

2 comments

> The fp64 and fp32 performance is needed for physical simulations

In the very unlikely case where

1) You need fp64 Matrix-Matrix products for physical simulations

2) You bought the MI355X accelerator instead of hardware better suited for the task

you can still emulate it with the Ozaki scheme.

What hardware is better suited for the task? FLOPS per dollar, nvidia is in retreat just as much as AMD is when it comes to fp64.
ARMv9 Scalable Matrix Extension (SME). Apple had outer-product matrix hardware (AMX) since 2019, but you cannot buy the chips by themselves.
Yeah, I saw the presentations at SC25, but I wasn't able to get anyone to commit to being able to buy them in the next year or three. Right now I have two open RFPs and nobody is bidding ARM.
expanding (i think) to your point, it's perhaps just a fork into two product lines for different uses?
Will there be future hardware optimized for physical simulations, or should existing/faster hardware be stockpiled now?
I am still using ancient AMD GPUs, bought between 2015 and 2019, because all later GPUs have much worse FP64 throughput per dollar.

So I was never able to upgrade them, because all newer GPUs are worse.

There was a little hope when the last generation of Intel discrete desktop Battlemage GPUs improved their FP64 throughput. While their throughput is relatively modest, i.e. half of a Zen 5 desktop Ryzen, they are extremely cheap so their performance per dollar is very good. Therefore they can be used to multiply the throughput of a desktop computer at a modest additional cost.

Unfortunately, with the new Intel CEO the future of the Intel GPUs is very unclear, so it is unknown whether they will be followed by better GPUs or they will be canceled. If Intel will stupidly choose to no longer compete in the GPU market, the last source of GPUs with good FP64 throughput will disappear.

The datacenter GPUs that still have good FP64 throughput have huge prices that cannot be justified for any small business or individual. In order to recover the cost of such GPUs you must have a workload that keeps them busy continuously, day and night. Such workloads must be aggregated from a large number of users. So we have regressed to the mainframes used by time-sharing around the beginning of the seventies of the last century, backwards from the freedom of personal computers.

I see no hope for the future availability of any computing devices with better FP64 throughput per dollar than the desktop CPUs. Technically, it would be trivial to make such devices, but the companies like AMD and NVIDIA do not care about small business or individual customers but only about selling to other equally huge companies, so they dimension their devices accordingly and they also set fictitious retail prices many times greater than the actual price that will be negotiated with the big companies. While the big companies will pay much less, small businesses or individuals cannot buy at other prices than the list prices, which means that they must give up on buying such devices as they are not worth such prices.

It took about 25 years for the cycle from Napster -> MP3 players -> flash memory -> smartphones -> big data -> big GPUs -> LLMs and generative AI -> OpenAI buying 100% of remaining memory wafer capacity from SK Hynix and Samsung = little left for the edge, with 100% price hikes for consumer DIMMs.

https://openai.com/index/samsung-and-sk-join-stargate/

> Samsung Electronics and SK hynix plan to scale up production of advanced memory chips, targeting 900,000 DRAM wafer starts per month at an accelerated capacity rollout, critical for powering OpenAI’s advanced AI models.

We need a new "Napster moment" to restart supply chain investment and business models at the edge. Humanoid robotics might qualify, since robots will need low-latency responses to local sensor input.

Another factor in edge vs. mainframe economics is the cost of energy in each location.