Y
Hacker News
new
|
ask
|
show
|
jobs
by
latchkey
596 days ago
Exactly. Latency is less relevant if you have to have 4 literal servers (each taking up a whole rack) to push out one single 70B model and we don't know how many concurrent user requests that actually services (probably 1).