| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bertil 2542 days ago

I’ve noticed two questions on twitter:

- Do you use a causal graph? Would it make sense?

- Spark seems over-kill for what you yourself describe as regression: is there something more intensive here that we could be missing?

2 comments

otterk10 2542 days ago

Our analysis runs over our user’s customer data (usually collected through either a tag manager or a CDP such as Segment), which is a few petabytes of data for some of our larger customers. The reason for using Spark is to quickly transform this massive amount of raw data into a ML-ready format. You’re correct that the regression itself does not need to be done inside of Spark.

link

otterk10 2542 days ago

We didn’t explore causal graphs because doing so would require manually creating a causal graph for each relationship that you wish to explore. Our goal was to create an automated approach that could provide an estimate of the treatment effect for any page/event within your app.

link