Y
Hacker News
new
|
ask
|
show
|
jobs
by
nullc
615 days ago
They're memory bandwidth limited, you can basically just estimate the performance from the time it takes to read the entire model from ram for each token.