Hacker News new | ask | show | jobs
by brittlewis12 898 days ago
FYI, just submitted a new update for review with a few small but hopefully noticeable changes, thanks in no small part to your feedback:

1. StableLM Zephyr 3b Q4_K_M is now the built-in model, replacing the Q6_K variant.

2. More aggressive RAM headroom calculation, with forced fallback to CPU rather than failing to load or crashing.

3. New status indicator for Metal when model is loaded (filled bolt for enabled, vs slashed bolt for disabled.)

4. Metal will now also be enabled for devices with 4GB RAM or less, but only when the selected model can comfortably fit in RAM. Previously, only devices with at least 6GB had Metal enabled.

Thank you so much again for your time!

1 comments

The fallback does seem to work! Although the 4-bit 7B models only run at 1 token every several seconds.

I still wish Phi-2, Dolphin Phi-2, and TinyLlama-Chat-v1.0 were available, but I understand you have plans to make it easier to download any model in the future.