Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.
Respectfully, the benchmarks show it is different.
MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:
LLM decode: MetalRT is 1.10-1.19x faster across all models tested
STT: 70s audio in 101ms vs 463ms (4.6x faster)
TTS: 178ms vs 493ms (2.8x faster)
mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.