|
|
|
|
|
by westurner
2415 days ago
|
|
This looks excellent. The ability to generate the Python code for the pandas dataframe transformations looks to be more useful than OpenRefine, TBH. How much work would it be to use Dask (and Dask-ML) as a backend? I see the OneHotEncoder button.
Have you considered integration with Yellowbrick? They've probably already implemented a few of your near-future and someday roadmap items involving hyperparameter selection and model selection and visualization? https://www.scikit-yb.org/en/latest/ This video shows more of the advanced bamboolib features:
https://youtu.be/I0a58h1OCcg The live histogram rebinning looks useful. Recently I read about a 'shadowgram' / ~KDE approach with very many possible bin widths
translucently overlaid in one chart.
https://stats.stackexchange.com/questions/68999/how-to-smear... Yellowbrick also has a bin width optimization visualization in yellowbrick.target.binning.BalancedBinningReference:
https://www.scikit-yb.org/en/latest/api/target/binning.html Great work. |
|
We are currently thinking about providing other dataframe libraries like dask or pyspark and similar. However, we are a little bit unsure on how to make sure that there is user demand before we implement it. It is not a complete rewrite but it would require some additional abstractions at some points in the library. And we need to check if some features might not be available any more. Would dask support be a reason to buy for you?
Great hint with yellowbrick and yes, we are considering some of those features as well if there is a useful place in the library.
In general, we are also thinking about ways how you can extend the library for yourself so that you can add your own analyses/charts of choice and then they will come up again the right point in time. In case that this is useful.