Y
Hacker News
new
|
ask
|
show
|
jobs
by
dmbaggett
720 days ago
For inference you could use a maxed-out Mac Ultra; the RAM is shared between the CPU and GPU.
1 comments
alecco
720 days ago
For single user (batch_size = 1), sure. But that is quite expensive in $/tok.
link