| HN Mirror

you can get a pretty good idea right now, the simulator is functional and the unit tests include explanations in english:

https://git.libre-soc.org/?p=openpower-isa.git;a=tree;f=src/...

i'm currently in the middle of a rabbit-hole exploration of being able to do in-place RADIX-2 FFT, DCT and DFT butterflys, the target is a general purpose function to cover each of those, in around 25 Vector instructions.

not 2,000 optimised loop-unrolled instructions specifically crafted for RADIX-8, another for RADIX-16, another for RADIX-32 ..... RADIX-4096 (as is the case in ffmpeg): 25 instructions FOR ANY 2^N FFT.

btw if you're interested in "real-world" SVP64 Vector Assembler we have the beginnings of an ffmpeg MP3 CODEC inner loop:

https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=medi...

that's under 100 instructions, more than 4x less assembler for the same job in PPC64. and 6.5 times less assembler than ffmpeg's optimised x86 apply_window_float.S

you will no doubt be aware of the huge power savings that brings due to reduced L1 cache usage.