|
|
|
|
|
by andrekandre
123 days ago
|
|
The sgai_rsp_matmul_q4() stub is planned for RSP microcode:
DMA Q4 weight tiles into DMEM (4KB at a time)
VMULF/VMADH vector multiply-accumulate for 8-lane dot products
Estimated 4-8× speedup over scalar VR4300 inference
----rsp is the gift that keeps on giving; such a forwards-looking architecture (shame about the rambus latency tho) |
|