Hacker News new | ask | show | jobs
by opportune 1065 days ago
I read the paper. It’s fundamentally I think an exponential problem but they apply some constraints to reduce it: only considering tuples of size n <= 3, pruning.

Personally I think the case where you’re doing this on one column is not that interesting: for each column, get all values with >= support frequency, group by it on old and new, include in results is risk_ratio over threshold. List results in order of risk_ratio. Going from that to just two columns is much much more computationally demanding and where pruning and such really matters.

1 comments

The article was giving me a market basket analysis vibe(maybe just the shared terminology), so possibly a similar underlying algorithm to do the pruning.