|
|
|
|
|
by api
366 days ago
|
|
Unless I’m missing some big numbers somewhere you could do that locally on a pi 5 with efficient code. Nothing heroic required, just a decently fast language like Go. My laptop can run 70B LLMs at usable speeds. I know. Doesn’t scale. No redundancy. No auto redeploy on failures. This is what I mean. Do we really have to sacrifice this much efficiency for those things or are we doing it wrong? Does the ability to redeploy on failures, cluster, and scale really require order of magnitude performance penalties across the whole stack? |
|