|
|
|
|
|
by thulle
922 days ago
|
|
Better output than the smaller llamas in my limited testing, but it's surprisingly slow: Output generated in 101.74 seconds (0.98 tokens/s, 100 tokens, context 82, seed 532878022) Output generated in 515.46 seconds (0.99 tokens/s, 511 tokens, context 27, seed 660997525) Checking nvidia-smi it stalls at ~130W (out of ~470 W max) power usage, ~25% GPU usage and ~10% memory bandwidth usage. There's fairly much traffic on the pci-bus though, and the python process is stable at 100% usage of one core. GPU possibly limited by some thing handled in python?
Pausing the GPU-accelerated video-decoding of a twitch stream it get a surprisingly large boost: Output generated in 380.42 seconds (1.34 tokens/s, 511 tokens, context 26, seed 648992918) |
|