|
|
|
|
|
by w1nk
1179 days ago
|
|
Does anyone know how/why this change decreases memory consumption (and isn't a bug in the inference code)? From my understanding of the issue, mmap'ing the file is showing that inference is only accessing a fraction of the weight data. Doesn't the forward pass necessitate accessing all the weights and not a fraction of them? |
|