'Unified memory' access on M1 is allegedly almost as fast as CPU cache, and I believe the SSDs are extremely close to the SoC as well. Swap might be faster than some computers' actual RAM.
This is the exact kind of PR bamboozlement I was alluding to - literally none of the above is correct:
M1 memory latency is 100ns, which wasn't really competitive with amd/intel at the time (70 something nano). Any SSD read be an order of magnitude slower than reading from ram - something to the tune of 5gb/s version vs 70gb/s for the SOC ram. For comparison, the intel trashcan mac pro clocked in at 60gb/sec a decade prior.
See, this is the thing about M1: I don’t care about your numbers (which I have no doubt are true), I care about my personal experience with the magic cold $999 aluminum slab that runs circles around everything I owned before.
Reading these always feels like winning an F1 race, then being gaslit about how that’s not possible because of inferior cylinder design.
> This is the kind of PR bamboozlement I was alluding to - literally none of the above is correct
I'm basing my knowledge on discussions with other developers on rwkv.cpp because we were talking about how performance scales with the number of tokens per iteration. Memory speed/bandwidth came up and some things about M1 were said. Sorry about that.
This is not the problem. You can find comments here how people literally said what 'it were running better than x86 Macs with 16+GBs'.
ADD: and 'nobody needs more than 8GB, because it works so fine'. Just to clarify.