Hacker News new | ask | show | jobs
by beautifulpeople 3595 days ago
Looking at the article, I'm not sure 2048-bit lines are quite needed (not to say that that wouldn't be interesting), from (http://www.theregister.co.uk/2016/08/22/armv8_scalable_vecto...): "And once a program has been built for SVE, it will run comfortably on any SVE-capable processor without recompilation, whether the CPU has support for 512, 1,024 or the full 2,048 bits. The SVE unit can automatically break a 2,048-bit vector into, say, four 512-bit vectors if its silicon implementation doesn't support the full length." This paragraph from The Register implies that you could have smaller chunks or larger depending on the silicon implementation. If you look at a 64-byte cache line (what most architectures have today, power & itanium are notable exceptions) that would mean 512-bits per line (assuming you can use the whole line, i.e. packed). For 2048-bit that means 4 cache lines worth of data could potentially be operated on at once.