|
|
|
|
|
by Aurornis
85 days ago
|
|
Using system memory and CPU compute for some of the layers that don’t fit into GPU memory is already supported by common tools. It’s workable for mixture of experts models but the performance falls off a cliff as soon as the model overflows out of the GPU and into system RAM. There is another performance cliff when the model has to be fetched from disk on every pass. |
|