Y
Hacker News
new
|
ask
|
show
|
jobs
by
zozbot234
70 days ago
10 minutes a day or 15 minutes a day is what the inference workload is like on fairly small models. Once you start streaming in weights from SSD, things slow down quite a bit and become quite power hungry.