| HN Mirror

I think a common problem is that these techniques get repurposed to solve problems that they weren't meant to. I have seen multiple people fall too often into the trap of using these visualizations to guess whether a dataset may be classified with high accuracy. I'm talking about cases where there already is a label - but viz. is used as a prior compute-cheap step to understand whether they would bother with classification at all, or should they pick a weak-vs-strong classifier, etc.

The problem of course is the insights from viz. provide "one-sided" information: IF your instances from different classes look separated, then you know that a decent classifier would do the job well. But if they don't appear separated, you don't know whether they can't be accurately classified: for all you know you don't have the right hyperparams. Also account for the fact that you're projecting d-dimensional data down to 2D/3D - this is heavily lossy; even with the right hyperparams there is a chance you won't see high separation. If you want to classify, just classify.