The ubiquitous delay slots in MIPS are one instruction-set feature that has aged really badly. RISC-V actively got rid of it in their design because it ends up being such a hindrance to, e.g. out-of-order implementations.
It's also a hindrance to in-order implementations that have a different number of branch delay cycles (e.g. different number of pipeline stages or instructions taking a variable number of cycles) than the original implementation.
Branch delay slots were a somewhat clever solution to reduce the complexity of the original implementation, but they baked implementation details into the ISA and became problematic when the implementation details changed.
> they baked implementation details into the ISA and became problematic when the implementation details changed.
Same reason why stuff like VLIW has failed to catch on. These things are so dependent on specific hardware implementation details that one can hardly call them general-purpose ISA's anymore.
No modern GPUs use VLIW. Ati/AMD switched from VILW to RISC-SIMD 8-9 years ago, NVIDIA a few years before that. Mobile phone GPUs gave up VILW for RISC too in the last 5 years or so.