Shameless plug: I previously wrote up a blog post on how to use an unsupervised feature selection analog of PCA to avoid many of the issues you point out here, and an associated python package to carry it out ("linselect", which you can pip install):