Many moons ago I wanted to do something similar for AI data sets and models over IPFS. I don't know the future for IPFS but I do hope the essence of a p2p data sharing infrastructure becomes more accessible to help individuals tackle some of the issues with large datasets with less hardware on hand.