Hacker News new | ask | show | jobs
by walrus01 58 days ago
As much as it's a fun gimmick to run a relatively good sized LLM like qwen 3.6 35B locally, I would much rather have the ability to run it remotely on a piece of hardware I control via VPN session. Much better on battery life and heat. If I'm on an airplane I care about having as much battery life as possible.

Let's say you have a basic setup like llama.cpp and llama-server on a remote server (even if it's just sitting under your home office desk) running a 35GB Q8 quantized model of qwen 3.6 35B, it's not difficult to make llama-server available to your laptop over just about any form of internet connection and VPN.

Having the ability to run that same model locally if you really need to because no internet connection whatsoever is available, but the times that you simultaneously have no internet and a serious need for something the model can output are fairly rare these days.

2 comments

This is what I am doing - it is rare that I'm in a situation with no Internet while traveling, but very often there is an intermittent connection. Using local models or even hosted foundation models is frustrating exercise in cancelled jobs and timeouts, but Tailscale + mosh + tmux is a really nice way to connect to a workstation and resume from where the session left off - or leave it running doing its thing and come back to it later.

Same with running my local dev environment's docker containers, now they run on that workstation and my battery life is far higher, treating my portable device as a dumb terminal.

Agreed. I got a beefy M5 MBP for local llms and for sustained inference it gets hot enough that I worry it may end up shortening its life.