|
|
|
|
|
by LoganDark
802 days ago
|
|
> It doesn't beat RTX 4090 when it comes to actual LLM inference speed Sure, whisper.cpp is not an LLM. The 4090 can't even do inference at all on anything over 24GB, while ASi can chug through it even if slightly slower. I wonder if with https://github.com/tinygrad/open-gpu-kernel-modules (the 4090 P2P patches) it might become a lot faster to split a too-large model across multiple 4090s and still outperform ASi (at least until someone at Apple does an MLX LLM). |
|
There are benchmark data showing that an Apple M2 Ultra is 47% and 60% slower against Xeon W9 and RTX 4090, or 0.35% and 2% slower against i9-13900K and RTX 4060 Ti, respectively, in Geekbench 5 Multi-threaded and OpenCL Compute tests.
Apple Silicon Macs are NOT faster than competing desktop computers, nor M1 was massively faster than NVIDIA 3070(Desktop - 2x faster than Laptop variant M1 was compared against) for that matter. They just offer up to 128GB shared RAM/VRAM options in slim desktops and laptops, which is handy for LLM, that's it.
Please stop taking Apple marketing materials at full face value or above. Thank you.