Y
Hacker News
new
|
ask
|
show
|
jobs
by
wongarsu
149 days ago
Which conveniently fits on one 8xH100 machine. With 100-200 GB left over for overhead, kv-cache, etc.
1 comments
storystarling
149 days ago
The unit economics seem pretty rough though. You're locking up 8xH100s for the compute of ~32B active parameters. I guess memory is the bottleneck but hard to see how the margins work on that.
link
kristianp
148 days ago
Yes, it only makes sense economically if you have batching over many users.
link