| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by terhechte 188 days ago
	Is there a way to run these Omni models on a Macbook quantized via GGUF or MLX? I know I can run it in LMStudio or Llama.cpp but they don't have streaming microphone support or streaming webcam support. Qwen usually provides example code in Python that requires Cuda and a non-quantized model. I wonder if there is by now a good open source project to support this use case?

2 comments

You can probably follow the vLLM instructions for omni here, then use the included voice demo html to interface with it:

Whisper and Qwen Omni models have completely different architectures as far as I know