Hacker News new | ask | show | jobs
by spockz 711 days ago
If this would be possible for an application, does it not make more sense to use SIMD instructions at that point?
1 comments

You want to use SIMD and multiple accumulators. In fact not only you want to use as many accumulators as the number of SIMD ALUs, as SIMD operations are usually longer latency you usually unroll SIMD loops for software pipelining, using more accumulators to break loop carried dependencies.