Hacker News new | ask | show | jobs
by xxbondsxx 3703 days ago
Doesn't this become less valueable as the data grows? You'll essentially always have at least one "true" value, and at that point you're basically doing the query of:

SELECT cateogry FROM orders WHERE express_delivery=TRUE GROUP BY category

Also are these aggregate functions as efficient?

2 comments

It still has to get the data from every row, unless you happened to make an index that correlates the two columns in just the right order.

This is where a more effective question would be more useful.

A question like 'in the last quarter, how many sales did we have for each category and shipping type?' You can then take the results and calculate more useful values like the percentage of express shipments, etc.

I don't think that author's examples are particularly useful because, just as you point out, on any decent-sized dataset at least a COUPLE of customers will have chosen to behave oddly.

But they seem quite handy for searching for bad data in the database, rather than analyzing customer behavior.