|
|
|
|
|
by aarondia
1492 days ago
|
|
This is a great point and something that we're actively working on improving in Mito. If you have millions of rows of data, its not enough to just scroll through your data, you need tools to build your understanding. Some of the tools that you mentioned exist in Mito today. For example, Mito generates summary information about each column (all of the .describe() info along with a histogram of the data). And we're creating features for gaining a global understanding of the data too. In practice, one of the main ways that we see people use Mito is for that initial exploration of the data. Often the first thing that users do when they import data into Mito is to correct the column dtype, delete columns that are irrelevant to their analysis, and filter out/replace missing values. |
|
You could develop some IP around efficient and effective ways to do this. Probably would require an ensemble of unsupervised methods.