|
|
|
|
|
by woadwarrior01
790 days ago
|
|
Is this news? I've got a nearly year old app that supports over 2 dozen local LLMs with support for using them with Siri and Shortcuts. I added support for Llama 3 8B the day after it came out and also Eric Hartford's new Llama 3 8B based Dolphin model. All models in it are quantized with OmniQuant. On iOS, 7B and 8B ones are 3-bit quantized and smaller models are 4-bit quantized. On the macOS version all models are 4-bit OmniQuant quantized. 3-bit Omniquant quantization is quite comparable in perplexity to 4-bit RTN quantization that all the llama.cpp based apps use. https://privatellm.app/ https://apps.apple.com/app/private-llm-local-ai-chatbot/id64... |
|