|
|
|
|
|
by woadwarrior01
105 days ago
|
|
> Apple M3 or later required. MetalRT uses Metal 3.1 GPU features available on M3, M3 Pro, M3 Max, M4, and later chips. M1/M2 support is coming soon. On M1/M2, RCLI automatically falls back to the open-source llama.cpp engine. So, no support for M5 Neural Accelerators, eh? (Requires Metal 4) ¯\_(ツ)_/¯ |
|
MetalRT currently targets Metal 3.1 GPU compute because that's where we get the most control over the decode pipeline. Neural Engine / ANE is powerful for fixed-shape inference (vision, classification) but autoregressive LLM decode, where you're generating one token at a time with dynamic KV cache, doesn't map as cleanly to ANE today.
That said, if Metal 4 opens up new capabilities that help with sequential token generation or gives better programmable access to the neural accelerator, we'll absolutely look at it. The M5 will be a fun chip to benchmark on.