My question is why would someone want to send their data to some api on the cloud rather than processing it locally himself using the numerous open source packages available to do this sort of thing?
For a lot of things, using a self-hosted open-source package is a good idea, absolutely.
However, for some applications there's a vendor with a neural network (or whatever) trained on a billion inputs, you see it perform better than the open source equivalent trained on a million inputs, and you don't have a billion inputs to improve the open source version with.
However, for some applications there's a vendor with a neural network (or whatever) trained on a billion inputs, you see it perform better than the open source equivalent trained on a million inputs, and you don't have a billion inputs to improve the open source version with.