Hacker News new | ask | show | jobs
by andrekandre 123 days ago

   The sgai_rsp_matmul_q4() stub is planned for RSP microcode:

     DMA Q4 weight tiles into DMEM (4KB at a time)
     VMULF/VMADH vector multiply-accumulate for 8-lane dot products
     Estimated 4-8× speedup over scalar VR4300 inference
----

rsp is the gift that keeps on giving; such a forwards-looking architecture (shame about the rambus latency tho)

1 comments

We are going to use the gpu 128simd soon but it only has 4kb ram addressable so matmul offload in small chunks!
thats such really cool work; i wish i could get payed to do stuff like this, more power to you all ^^
I am doing this for 0 dollars. I am a self funded ai research lab. So when people diss me i get a little jaded, but then I remember I am doing cool stuff. Even if others don't see it. Thats enough for me!