On my machine, I am able to create a prompt that suits my need and chat with the model in realtime. With 100% GPU offload, it replies within half a second. LM studio provides an OpenAI compatible api endpoint for my Dotnet software to use. This boosts my developer experience significantly. The Azure services are slow and if you want to regenerate a serie of responses (e.g part of conversation flow) it just takes too long. On my local machine I also do not worry about cloud costs.
As a bonus; I also use this for a personal project where I use prompts and Llama3 to control smart devices. JSON responses from the LLM are parsed and translated into the smart device commands from a raspberry pi. I control it using speech via my Apple Watch and Apple shortcuts to the raspberry pi’s api. It all works magically and fast. Way faster than pulling up the app on my phone. And yes the LLM is smart enough to control groups of devices using simple conversational AI.
As a bonus; I also use this for a personal project where I use prompts and Llama3 to control smart devices. JSON responses from the LLM are parsed and translated into the smart device commands from a raspberry pi. I control it using speech via my Apple Watch and Apple shortcuts to the raspberry pi’s api. It all works magically and fast. Way faster than pulling up the app on my phone. And yes the LLM is smart enough to control groups of devices using simple conversational AI.
edit; here's a demo https://www.youtube.com/watch?v=dCN1AnX8txM