|
|
|
|
|
by hualapais
23 days ago
|
|
Went this route after hemming and hawing over a Mac Studio Pro for some time. Eventually bought and configured a headless HP Z620 with 192 GB of ECC RAM and dual Xeon E5-2680 v2 processors, an Optane AIC, two P102-100s with 10 GB VRAM each, and a minimal bootable SDD running Debian 12.6 with an older, locked version of CUDA that supports the Pascal cards. Run it remotely from the basement via AMT/meshcommander. Just fire up llama.cpp and its front end and connect over the local network. Currently playing with Talkie, Qwen 3.6 27b, and medgemma, but have had good luck with GGUF performance in general after selecting an appropriate quant. Total cost was under $500, but I bought the server via eBay last year; things may be different now. Details aside, the hope is that ternary LLMs blossom in the coming months and this old hardware can eventually host some very dense models full of factual information, perhaps even larger than the GPU RAM and spilling over to the Optane for IO. Speed would be less important than general factual knowledge. The plan would be to configure then mothball the machine in a Faraday trashcan in the basement, retaining it as a possible "rebuild civilization" oracle should the world fall apart. Of course, power would be an issue in such a scenario, but for how cheap this hardware is and how often AI seems to be practically useful in its latest iterations, why not... |
|