Y
Hacker News
new
|
ask
|
show
|
jobs
by
akx
916 days ago
You don't necessarily need to fit the model all in memory – llama.cpp supports mmaping the model directly from disk in some cases. Naturally inference speed will be affected.