Y
Hacker News
new
|
ask
|
show
|
jobs
by
dkarras
929 days ago
no doesn't work that way. experts can change per token so for interactive speeds you need all in memory unless you want to wait for model swaps between tokens.