Hacker News new | ask | show | jobs
by dmbaggett 720 days ago
For inference you could use a maxed-out Mac Ultra; the RAM is shared between the CPU and GPU.
1 comments

For single user (batch_size = 1), sure. But that is quite expensive in $/tok.