Hacker News new | ask | show | jobs
by MBCook 5 days ago
Is that required or just a choice Apple made?
2 comments

What do you mean by required? Apple's prices are notoriously disconnected from the cost of manufacturing.
I mean is it possible to make unified memory systems with good performance or is it not really feasible due to memory timing/trace length issues?

It’s possible if you’re willing to go with much slower RAM than GPUs like but CPUs often use. Thats what integrated graphics laptops have done for a long time right?

But can you get high end CPU and GPU performance with unified memory and maintain user upgradable memory in a reasonable way? Thats what I don’t know.

> I mean is it possible to make unified memory systems with good performance or is it not really feasible due to memory timing/trace length issues?

LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using; there's always been some speed penalty. I'm not sure we've ever seen a system demonstrated using LPCAMM or similar for a 512-bit bus to match Apple's Max tier SoCs, so it's somewhat of an open question whether those solutions can offer upgradability at the high end of the market for unified memory systems.

> LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using; there's always been some speed penalty.

LPCAMM2 supports up to 9600MT/s, which appears to be the same speed Apple is using.

> I'm not sure we've ever seen a system demonstrated using LPCAMM or similar for a 512-bit bus

Servers commonly use a 768-bit DDR5 memory bus per socket even without LPCAMM and LPCAMM allows shorter traces than traditional DIMMs. It's basically down to most existing DDR5 system boards/sockets having been designed before anyone was trying to run LLMs on consumer hardware, e.g. AM5 has a 128-bit memory bus and you're not changing that without a new socket. But every memory generation gets a new socket anyway, and the existing Threadripper Pro socket has a 512-bit memory bus as well.

Moreover, making the bus wider is "easy" -- the main problem with it is that it adds cost. Apple's least expensive machines use the same 128-bit memory bus as most PCs and the ones with the 512-bit bus cost as much as Threadripper if not more.

> LPCAMM2 supports up to 9600MT/s, which appears to be the same speed Apple is using.

The difference here is in what the standard defines on paper vs what is actually shipping in products and readily available off the shelf. Who's selling a whole system with LPCAMM2 certified for 9600MT/s? Intel's current-gen Panther Lake top of the line laptop chips are rated for 9600MT/s when using soldered LPDDR5x but only 7467MT/s when using LPCAMM2, according to their current datasheet: https://www.intel.com/content/www/us/en/content-details/8721...

That puts the current Intel-with-LPCAMM2 supported memory speed at 1.5 years and counting lag behind Apple's shipping memory speeds. Intel's own shipping memory speed moved past 7467MT/s a few months earlier than even Apple's.

> Servers commonly use a 768-bit DDR5 memory bus per socket even without LPCAMM and LPCAMM allows shorter traces than traditional DIMMs.

> Moreover, making the bus wider is "easy"

Citations needed. Servers aren't anywhere close to 9600MT/s yet; Intel and AMD are at 6400MT/s. The trace length advantages offered by LPCAMM2 don't necessarily mean the traces for the sixth or eighth channel would be short enough for 9600MT/s (which again, is not yet available even in a 128-bit configuration in shipping hardware). Adding more channels to even a LPCAMM2 configuration means adding more trace length, because only two modules can actually be adjacent to the CPU socket. (Maybe you could get to 512-bit with modules on the front and back of the board while maintaining trace lengths short enough to reach meaningfully higher speeds than regular DDR5, but so far nobody is doing that or even talking about it.)

> Who's selling a whole system with LPCAMM2 certified for 9600MT/s?

The 9600MT/s modules are new and will probably be found at some point this year. Framework already sells LPCAMM2 at 8533MT/s with full validation:

https://knowledgebase.frame.work/what-drammemory-is-supporte...

> That puts the current Intel-with-LPCAMM2 supported memory speed at 1.5 years and counting lag behind Apple's shipping memory speeds.

It turns out Apple isn't getting 9600MT/s either. I assumed that soldering would be getting them at least what LPCAMM2 is rated for, but if you actually do the math, they're getting ~8500MT/s for their most expensive systems and ~7500MT/s for the others.

> Servers aren't anywhere close to 9600MT/s yet; Intel and AMD are at 6400MT/s.

Servers use conservative timings. EXPO memory kits above 6400MT/s are available for Threadripper with 8 channels. And again, these are using traditional DIMMs with longer traces rather than CAMM, but they're still managing an extremely wide bus with close to the same performance.

> The trace length advantages offered by LPCAMM2 don't necessarily mean the traces for the sixth or eighth channel would be short enough for 9600MT/s

CAMM modules use a compression fitting to attach the chips to the system board using approximately the same amount of space as the solder pads would for soldered chips. If you get to the point of having so many channels that the chips are in the way of the other chips then the soldered ones have the same problem.

> (which again, is not yet available even in a 128-bit configuration in shipping hardware).

A single LPCAMM2 module is a 128-bit bus. Every system that uses it has at least that.

> Maybe you could get to 512-bit with modules on the front and back of the board while maintaining trace lengths short enough to reach meaningfully higher speeds than regular DDR5, but so far nobody is doing that or even talking about it.

Nobody is really using a bus that wide with soldered memory either though, outside of the couple of Macs that start at ~$3500 and are getting the same speed Framework does with LPCAMM2.

Multiplexed DDR (MRDIMM) can go faster.

But for throughput served with 12 channels have pretty high theoretical even with slower

> LPCAMM and similar solutions exist, but have never been demonstrated running at speeds that match what the leading soldered memory systems are using;

Does it need to be leading, though? Being median is just fine for what high-RAM systems are intended to be used for.

You mean Apple prices are notoriously over priced, over hyped, under powered, and

"Abdul Jabar, couldn't have made these prices, with a sky hook."

both. soldered ram is faster. also Apple don't want to offer upgradblity after purchase.
Don't I/you wish. The mechanical junction adds no delay, only manufacturing expense, and the delay of purchasing new systems to keep up with OS bloat.

Actually the opposite is true. Socketed RAM can be made to overclock and adjust timings, while soldered ram, no. Two Lenovo's one soldered ( Carbon X1 ), one T590, one slot: Crucial 16GB, 260-pin SODIMM, DDR4 PC4-19200. Exact same processor, the X1 is DDR3 soldered on 532.0 MHz PC3-1066. The T590, has DDR4, PC4-19200, 1200Mhz.

Both have a Core i7 8665U... and the T590 is much faster, with socketed ram.

I think you'll find that in the current day, high speed LP(?)DDR5 requires a better signal path than what the SODIMM can provide. Which is why laptop makers initially moved to soldered RAM before moving to CAMM (probably only for the high end ones).