Hacker News new | ask | show | jobs
by codepixel 843 days ago
oooh thanks for the tip, will try this
1 comments

Stable LM 3B Zephyr, it's the only model below 7B that can handle RAG: i.e. understand "hey those are documents, use them to answer these questions"

It'll work too, it was quite delightful to open Test Flight, install my Flutter app not designed for Vision Pro at all, and everything "just worked".

https://stability.ai/news/stablelm-zephyr-3b-stability-llm works absolutely fine on the M2 processor, like 40 tok/s https://x.com/EMostaque/status/1732912442282312099?s=20

Stable LM 2 1.6b runs even faster but not as good at RAG, great multilingual though, we are seeing it matching 70b models on other languages (new version soon) https://x.com/EMostaque/status/1763269238347673796?s=20

Can fit a lot in a gigabyte file it seems.

Is this Flutter app something you created? If so, is it open source? I’m in that same space and I generally just like to learn from other people’s work.

If not, all good. I don’t have a Vision Pro myself but I got a similar app which runs on all platforms including iPadOS, thus I guess my app should work on that too. Thanks for the reminder!

Thanks for asking: Yes I did make it, but, no app tying it all together. At least, it isn't out yet.

The grunt work of getting it running on different platforms + nice easy OpenAI compatible interfaces x RAG x voice assistant is open source:

- FLLAMA: https://github.com/Telosnex/fllama llama.cpp at core, openai compatible API, function call support, multimodal model support, Metal support. All platforms incl. web, but WASM is slow, def. not worth it except as a proof of concept.

- FONNX: https://github.com/Telosnex/fonnx ONNX runtime at core, all platforms including web. Whisper, Silero VAD, Magika, and two embeddings models. (Mini LM L6 V3 is best for RAG)

EDIT: I knew I recognized your username! Aub.ai! Cheers, what you did with aub.ai convinced me it was possible to do llama.cpp in flutter with a high bar for engineering quality. Other stuff seemed a tad rushed, unstable, and not complete. Also congrats, just saw your recent update, been hoping something good came through and it did.