|
The vector register file is a block of registers 64 bytes across by 64 down which can be accessed horizontally (H) or vertically (V), and in units of 16 bytes (H8), 16 shorts (H16 or HX), or 16 32-bit words (H32). Each processor has an accumulator that can be cleared (CLRA) or used (ACC). This code is basically equivalent to the following C code: uint8 in[2][16];
int16 temp[2][16];
int32 r1,r3,r5;
uint8 *r0;
do {
... code omitted
// vmul -,H(1,0),3 CLRA ACC
// vadd HX(3,0),H(0,0),2 ACC
for(x=0;x<16;x++) {
temp[x]=inb[1][x]*3+in[0][x]+2;
}
//vasr H(0++,0),HX(2++,0),2 REP 2
//vst H(0++,0),(r0+=r3) REP 2
for(y=0;y<2;y++) {
for(x=0;x<16;x++) {
in[y][x] = temp[y][x]>>2;
r0[y*r3+x] = in[y][x];
}
}
// add r0,16
// addcmpblt r5,1,r1,loopacross
r0 += 16;
r5 += 1;
} while (r5<r1);
This is code for the vector processor. This is not the same as the GPU cores which use a different architecture and instruction set (based around floating point calculations). |
This project of yours look really cool, https://github.com/peterderivaz/pymeta3
What do you think about http://www.myhdl.org/doku.php ?