|
|
|
|
|
by jfoutz
5345 days ago
|
|
Wow. So, less than 300k or so and you stay in L1, which is crazy fast. Contiguous reads must have some trick for streaming into L1 in anticipation of the request. The only explanation i have for the large stride/large read speedup is maybe you're laying out data in separate memory modules so you get some parallel reads. I guess that curve from 8b to 4kb comes from increasing collisions? Is this even vaguely right? That's a cool graph. |
|