Currently I'm testing something like this just to see what happens. I have an old laptop with 4GB of RAM. I attached a USB drive with Gemma 4 31B model (which is 32.6 GB). Currently the laptop is running llama.cpp and trying to respond to a prompt by streaming the model from disk.
The USB drive light is flickering, showing something is happening. It's been about 8 hours since I entered the prompt and I've gotten about 10 tokens back so far. I'm going to leave it running overnight and see what happens.
Wow, that's a true worst case scenario especially if the USB is just plain old USB 2.0 (max 480 Mbps) and/or if the drive is a spinning disk. How's the CPU doing, though? Is there any headroom given the USB bottleneck?
It's now spit out about 40 tokens after maybe 18 hours and has not finished the "thinking" stage of responding to the prompt. I'll let it keep running to see what happens
Not sure if this is exactly the scenario you envision but I run ComfyUI on an Acer Helio 300 laptop, from four years ago. Has 16GB RAM, NVIDIA GeForce RTX 2060 w/6144MiB of VRAM and have generated a few images using "NetaYumev35_pretrained_all_in_one.safetensors" @ 10.6GB checkpoint, (well beyond the 6GB capacity of the RTX 2060 card). That being said, it takes more than 10 minutes to complete the task. Of course, I have to turn off all other apps, and browser tabs or hibernate them. If I don't, the laptop's fans begin to spin up like an airplane propeller. It's worth mentioning that I've tried to do this with other IDEs and all seem to fail with some error or another, usually out of VRAM issue. I've only gotten it to work with ComfyUI.
I use an anaconda environment, though would have preferred an "uv" environment, on Linux and automate the startup sequence using the following script (start_comfy.sh) from the term rather than manually starting the environment from same said term:
I'm not running local for exactly the same reason, to not stress my components. As it seems we are in for a long haul due to this AI bubble (can't wait for it to pop) so need to make sure I survive this madness, as for sure I can't afford to replace anything right now.
I don't know that any AI bubble will pop. AI can be used to accelerate therapies, cures, make scientific advancements. Add to that, quantum science technology which if successful, should accelerate things, depending on who's the one at the wheel. Problem is the gap between now and then (e.g. age abundance). It's going to be a difficult road for good number of the population until that day comes. I'm scouting potential locations of bridges, to live under, so that I can find and claim one when homeless day arrives.
I can't help but feel that companies using AI, engaging in employee layoffs, are shooting themselves in the foot. The endgame for them will be zero profits, since displaced workers translates to no money to pay for goods and services :|
I'm using ROG Phantom laptop with Strix Halo iGPU that has a whopper of 128 GB VRAM. Next year there will be the rumored Medusa Halo with 256 GB VRAM, which is more than enough to run DeepSeek V4 Flash.
I don't think you're the odd one out. I would be very curious to try to run Opus 4.7 on a (high end) laptop. I'd also like to see how it runs on a high-end workstation rig built for it.
I mean, inference engine might need to get some tweaks, to support whatever compute is available. But then, if you put a few terabytes of disk for swap, and replace RAM to bigger sticks if possible, it should work? Slowly, of course, but there is no reason it should not to.
The USB drive light is flickering, showing something is happening. It's been about 8 hours since I entered the prompt and I've gotten about 10 tokens back so far. I'm going to leave it running overnight and see what happens.