| HN Mirror

The goal isn't to present PCA directly to stakeholders. It is something you do at the start to understand your data. The premise is that it is extremely likely that there are significant batch effects or other statistical correlations in the data that you are probably unaware of at the outset. You should aim to discover these early on. To do this you need to use an unsupervised method because the whole point is you don't what they are.

> you are tasked to find the main drivers of sales on a given city. You run PCA on the data and find two main components on the dataset

Obviously it depends what comes out. But in all likelihood you will see some significant clusterings / divisions in PC1 and PC2, so you will try to interpret what properties of the points are driving those. You can do it in a data driven way (what are the significant coefficients in principal component vectors) or you can often do it in an exploratory way ... are they related to geography, are they related to age demographics, are they seasonal ... you color the data points by different possible explanatory variables to see what group things together. And you will very likely see things jump out (eg: you could find that the main reason a particular month was down in sales was due to a technical problem with the web site and you'll want to put that aside, because it doesn't have any predictive value).