|
|
|
|
|
by mzubairtahir
4 hours ago
|
|
i think you did not check app properly, it is actually taking required context window from the user and then caluclate kv cache size and then count it along with size of model itself. it also reserves some more memory to avoid oom.... |
|