could you use multiple perceptual hash functions with different salts, so that collisions would be less likely while allowing derivatives to be detected?
That reduces to just inventing a “fancier” single hash function. This adds no value or security in cryptography; it just makes things slower. I expect the same is true of perceptual hashes.
They aren't just matching exact hash hits, but are using a metric like the hamming distance between hashes to determine if one image is the same as , or a derivative of, another. The data structures that allow for efficient lookups rely on that metric, or another metric, for matching.