|
|
|
|
|
by Tostino
470 days ago
|
|
You are missing something. This is a single stream of inference. You can load up the Nvidia card with at least 16 inference streams and get at much higher throughout tokens/sec. This just is just a single user chat experience benchmark. |
|