|
|
|
|
|
by MoonGhost
382 days ago
|
|
> 16x 3090 system That's about 5KW of power > that gets 7 token/s in llama.cpp Just looking at electricity bill it's cheaper to use API of any major providers. > If you aren't prompting Deepseek in Chinese, a lot of the experts don't activate. That's interesting, it means the model can be cut and those token routed to another closest expert, just in case they happened. |
|