|
|
|
|
|
by paulmd
4154 days ago
|
|
Actually I should correct this - if you access the slow segment at all performance will be degraded, since you cannot also access the fast memory during the same cycle. Looking at it on a 2-cycle basis, since performance is 7x as high you can either access (7+7) or (7+1) chunks of memory. That's a 43% performance drop if even 1 of the 32 threads in a warp consistently needs to touch the slow segment. That data being used for control flow will amplify the problem, of course, since latency will double. |
|