Hacker News new | ask | show | jobs
by paulmd 4154 days ago
Actually I should correct this - if you access the slow segment at all performance will be degraded, since you cannot also access the fast memory during the same cycle.

Looking at it on a 2-cycle basis, since performance is 7x as high you can either access (7+7) or (7+1) chunks of memory. That's a 43% performance drop if even 1 of the 32 threads in a warp consistently needs to touch the slow segment.

That data being used for control flow will amplify the problem, of course, since latency will double.