Hacker News new | ask | show | jobs
by buildbot 1492 days ago
This is very interesting since the M1 studio supports 128GB of unified memory - training a large memory heavy model slowly on a single device could be interesting, or inferencing a very large model.
1 comments

Everything old is new again - the M1 studio's unified memory echos the SGI O2 which had similar unified CPU/GPU memory back in the 90's.

In both cases the unified memory machines outperformed much larger machines in specific use cases.

...specific use cases being the key operand here. Unified memory is cool, but there are reasons we don't use it at-scale:

- It needs extremely high-bandwidth controllers, which severely limits the amount of memory you can use (Intel Macs could be configured with an order of magnitude more ram in it's server chips)

- ECC is still off-the-table on M1 apparently

- Most workloads aren't really constrained by memory access in modern programs/kernels/compilers. Problems only show up when you want to run a GPU off the same memory, which is what these new Macs account for.

- Most of the so-called "specific workloads" that you're outlining aren't very general applications. So far I've only seen ARM outrun x86 in some low-precision physics demos, which is... fine, I guess? I still don't foresee meteorologists dropping their Intel rigs to buy a Mac Studio anytime soon.

> Most workloads aren't really constrained by memory access in modern programs/kernels/compilers. Problems only show up when you want to run a GPU off the same memory, which is what these new Macs account for.

For sure but I expect this is different for the apps Apple _wants_ to write. It’s easy to imagine the next version of Logic or whatever doing fine tuning everywhere.

What is there to fine-tune, in a program like Logic? I've often heard that word associated with using extended instruction sets and leveraging accelerators, but where would the M1 have "untapped power" so-to-speak? I don't think the "upgrade" from a CISC architecture to a RISC one can yield much opportunity for optimization, at least not besides what the compiler already does for you.
> - It needs extremely high-bandwidth controllers, which severely limits the amount of memory you can use (Intel Macs could be configured with an order of magnitude more ram in it's server chips)

In the first half of 2023, NVIDIA Grace Superchip will ship with an 1TB memory config (930GB usable because ECC bits) on a 1024-bit wide LPDDR5X-8533 config (same width as M1 Ultra, with LPDDR5-6400).

So it's going to become much less of an issue really soon.

> So it's going to become much less of an issue really soon.

The main issue would be trying to purchase one of those, which is likely going to be both very rare and orders of magnitude more expensive than a Mac Studio.

The Mac Studio isn't some crazy exotic hardware like datacenter class GPUs, but definitely has some exotic capabilities.

> The Mac Studio isn't some crazy exotic hardware like datacenter class GPUs, but definitely has some exotic capabilities.

Datacenter class GPUs are expensive yeah, but are quite easy to buy, even in a single unit amount.

example: https://www.dell.com/en-us/work/shop/nvidia-ampere-a100-pcie... for the first random link, but there are other stores selling them for significantly cheaper.

I wonder what their CPU pricing will be though... we'll see I guess.