|
|
|
|
|
by halJordan
60 days ago
|
|
Meaningless question, fit will put everything on the gpu if it fits. Fa is default on. No-mmap is not an inference tradeoff and if you do turn it off you need to turn on direct io via -dio What he should actually do is enable speculative decoding |
|