| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by focusgroup0 97 days ago
	The fact that Apple didn't ship this in years after Siri acquisition is an indictment of its Product leadership

2 comments

sanchitmonga22 96 days ago

Apple has the silicon, the frameworks (MLX, CoreML), and the models. The gap is putting it all together into a fast, unified on-device pipeline. That's what we're focused on, and honestly, we think Apple will eventually ship something similar natively. Until then, we're trying to show whats possible today on their hardware.

link

liuliu 97 days ago

This is not different from mlx-lm other than it uses a closed-source inference engine.

link

sanchitmonga22 96 days ago

Respectfully, the benchmarks show it is different.

MetalRT and mlx-lm use the exact same model files, identical 4-bit MLX weights. That makes it a pure engine-to-engine comparison:

LLM decode: MetalRT is 1.10-1.19x faster across all models tested

STT: 70s audio in 101ms vs 463ms (4.6x faster)

TTS: 178ms vs 493ms (2.8x faster)

mlx-lm is a general-purpose array computation framework that also supports inference. MetalRT is purpose-built for inference only. That focus is where the performance gap comes from.

You can reproduce these numbers yourself: rcli bench runs the same benchmarks we published. Full methodology: https://www.runanywhere.ai/blog/metalrt-fastest-llm-decode-e...

Yes, MetalRT is closed-source. We're transparent about that. The performance difference is the reason it exists.

link