|
|
|
|
|
by coder543
898 days ago
|
|
I just checked and MLC Chat is running the 3-bit quantized version of Mistral-7B. It works fine on the 14 Pro Max (6GB RAM) without crashing, and is able to stay resident in memory on the 15 Pro Max (8GB RAM) when switching with another not-too-heavy app. 2-bit quantization just feels like a step too far, but I’ll give it a try. Regarding credit, I definitely don’t need any. Just happy to see someone working on a better LLM app! |
|
1. StableLM Zephyr 3b Q4_K_M is now the built-in model, replacing the Q6_K variant.
2. More aggressive RAM headroom calculation, with forced fallback to CPU rather than failing to load or crashing.
3. New status indicator for Metal when model is loaded (filled bolt for enabled, vs slashed bolt for disabled.)
4. Metal will now also be enabled for devices with 4GB RAM or less, but only when the selected model can comfortably fit in RAM. Previously, only devices with at least 6GB had Metal enabled.
Thank you so much again for your time!