|
|
|
|
|
by isoprophlex
261 days ago
|
|
Extremely impressive, but can one really run these >200B param models on prem in any cost effective way? Even if you get your hands on cards with 80GB ram, you still need to tie them together in a low-latency high-BW manner. It seems to me that small/medium sized players would still need a third party to get inference going on these frontier-quality models, and we're not in a fully self-owned self-hosted place yet. I'd love to be proven wrong though. |
|