Hacker News new | ask | show | jobs
by stevepike 2303 days ago
A very reductionist version of our company that Allison hates when I use is "csvstat but on the internet" :-). I think the problem of auto-summarizing datasets has hit kind of a local maximum in what pandas dataframe summaries (csvstat is a similar python tool) can do on one machine. We will be able to add much fancier things like sophisticated type classification (e.g., is this field a stock ticker) without burning your CPU.
1 comments

hah! but this is a very interesting area. You're right on the auto-summarizing issue becoming a problem these days with the usage of larger datasets. Data versioning also is starting to become a larger problem and I saw that you guys already have addressed it in your enterprise product. Hoping to see some sort of API-like version for comparison of data troves from different timelines in the future.