| HN Mirror

The versal stuff isn't really an FPGA anymore. The chips have PL on them, but many don't. The consumer NPUs from AMD are the same versal aie cores with no PL. They just aren't configurable blocks in fabric anymore and don't have the same programming model. So I'm not contradicting myself here.

That being said, versal aie for ml has been a terrible failure. The reasons for why are complicated. One reason is because the memory hierarchy for SRAM is not a unified pool. It's partitioned into tiles and can't be accessed by all cores. additionally, access of this SRAM is only via dma engines and not directly from the cores. Thirdly, the datapaths for feeding the VLIW cores are statically set, and require a software configuration to change at runtime which is slow. Programming this thing makes the cell processor look like a cakewalk. You gotta program dma engines, you program hundreds of VLIW cores, you need to explicitly setup on chip network fabric. I could go on.

Anyway, my point is FPGAs aren't getting ML slices. Some FPGAs do have a completely separate thing that can do ML, but what is shipped is terrible. Hopefully that makes sense.