I suspect transform invariance is what is meant, although we find some transforms much harder than others which may hint at a more descrete process than a transform matrix in human visual systems.
I'd say transformations are more important than rotations, as in a 3D world we'll almost never see an object from a perpendicular view point, but most of the time we'll see objects that are the right way up.
> in a 3D world we'll almost never see an object from a perpendicular view point
True, however transforms would be more useful as an umbrella term in this context for the subset of transforms that include perspective + orientation of a fixed geometry. Visual systems only need to care about this subset in almost all cases...
In which case it's conceivable that we infer geometry through a set of discrete transforms somewhat like rotations, translations and scaling, or perhaps there is a component that did happen to converge on something more unified resembling an arbitrary transform matrix. If only we could simply identify these pieces in biological systems.