Hacker News new | ask | show | jobs
by pcf 664 days ago
In some brief testing, I discovered that the same models (Llama 3 7B and one more I can't remember) are running MUCH slower in LM Studio than in Ollama on my MacBook Air M1 2020.

Has anyone found the same thing, or was that a fluke and I should try LM Studio again?

5 comments

Just chiming in with others to help out:

By default LM Studio doesn't fully use your GPU. I have no idea why. Under the settings pane on the right, turn the slider under "GPU Offload" all the way to 100%.

That froze the whole computer, and even disabled the possibility of clicking both the internal and external trackpad.

The model is Dolphin 2.9.1 Llama 3 8B Q4_0.

I set it to 100% and wrote this: "hi, which model are you?"

The reply was a slow output of these characters, a mouse cursor that barely moved, and I couldn't click on the trackpads: "G06-5(D&?=4>,.))G?7E-5)GAG+2;BEB,%F=#+="6;?";/H/01#2%4F1"!F#E<6C9+#"5E-<!CGE;>;E(74F=')FE2=HC7#B87!#/C?!?,?-%-09."92G+!>E';'GAF?08<F5<:&%<831578',%9>.='"0&=6225A?.8,#8<H?.'%?)-<0&+,+D+<?0>3/;HG%-=D,+G4.C8#FE<%=4))22'*"EG-0&68</"G%(2("

Help?

Maybe so the web browser etc. still has some GPU without swapping from main memory? What % does it default to?
Two replies to parent immediately suggest tuning. Ironically, this release claims to feature auto-config for best performance:

“Some of us are well versed in the nitty gritty of LLM load and inference parameters. But many of us, understandably, can't be bothered. LM Studio 0.3.0 auto-configures everything based on the hardware you are running it on.”

So parent should expect it to work.

I find the same issue: using a MBP with 96GB (M2 Max with 38‑core GPU), it seems to tune by default for a base machine.

Make sure you turn on the use of the GPU using the slider. By default it does not leverage the full speed.
Yeah, me. Even without other applications running in the background and without any models loaded, the new 0.3 UI is stuttering and running like a couch-locked crusty after too many edibles on my Macbook Air 2021, 16GB. When I finally get even a 4B model loaded, inference is glacially slow. The previous versions worked just fine (they're still available for download).
Don’t forget to tune your num_batch