|
|
|
|
|
by a_conservative
398 days ago
|
|
my m4max macbook can run local inference on a medium-ish gemini model (32b IIRC). The power consumption spikes by about 120 watts over idle (with multiple electron apps, docker, etc). It runs about 70 tokens/sec and usually responds within 10 to 20 seconds. So.. picking some numbers for calculation. 4 answers per minute @ 120 watts is about .5 watt-hours per answer. ~200 responses would be enough to drain the (normally quite long lasting battery). How does that compare to the more common nvidia GPUs? I don't know. |
|