Hacker News new | ask | show | jobs
Show HN: Xybrid – run LLM and speech locally in your app (no back end, Rust) (github.com)
6 points by theGlenn 92 days ago
Hi HN,

We built Xybrid, a Rust library for running LLM + speech pipelines directly inside your app, no server, no daemon, just one binary.

We started building it while working on a privacy-focused LLM app with Tauri and realized there wasn’t a straightforward way to embed models directly into shipped applications without relying on a separate server process.

Xybrid links into your process like any other library. It supports GGUF / ONNX / CoreML and integrates with Flutter, Swift, Kotlin, Unity, and Tauri, letting you run pipelines like speech → LLM → speech in a single call.

On recent phones, we’re seeing ~20 tok/s on Android and ~40 tok/s on iOS for small (~3B) quantized models (varies by device, backend, and thermals).

The demo that shows it best: a Unity tavern scene where 6 NPCs generate real-time dialogue fully on-device — no API key, no internet, no per-request cost.

Unity demo: https://youtu.be/vSPeTyeow6A Desktop demo (Tauri): https://youtu.be/o83YShqV7O4

GitHub: https://github.com/xybrid-ai/xybrid

It’s still early — there are rough edges, especially around model support and performance tuning. Happy to answer questions about the architecture, backends, or integrations (Flutter, Swift, Kotlin, Unity, Tauri).

1 comments

Amazing!
Thanks nanark, let us know if you give it a try!