|
|
|
|
|
by brucethemoose2
1042 days ago
|
|
For prompt ingestion... I dunno. Unbatched token generation is basically RAM bandwidth limited, as the entire model has to be cycled through for each token. I bet theoretical performance is similar to the GPU, albeit with much lower power consumption. |
|