I’d much prefer NVMe colocated compute. Imagine a columnar storage engine able to filter and aggregate data during scans without reading it through PCIe, for example.
ScaleFlux https://scaleflux.com computational storage might offer some of what you are imagining. Their NVMe drives have onboard ARM cores and perform hardware compression and advanced flash management with no drivers beyond standard NVMe. I believe you can tap into the computational capabilities with additional code.