Hacker News new | ask | show | jobs
by kite_and_code 2415 days ago
Thank you for your feedback and support :) Are you currently using OpenRefine?

We are currently thinking about providing other dataframe libraries like dask or pyspark and similar. However, we are a little bit unsure on how to make sure that there is user demand before we implement it. It is not a complete rewrite but it would require some additional abstractions at some points in the library. And we need to check if some features might not be available any more. Would dask support be a reason to buy for you?

Great hint with yellowbrick and yes, we are considering some of those features as well if there is a useful place in the library.

In general, we are also thinking about ways how you can extend the library for yourself so that you can add your own analyses/charts of choice and then they will come up again the right point in time. In case that this is useful.

1 comments

In the past, I've looked at OpenRefine and Jupyter integration. Once I've learned to do data transformation with pandas and sklearn with code, I'll report back to you.

Pandas-profiling has a number of cool descriptive statistics features as well. https://github.com/pandas-profiling/pandas-profiling

There's a new IterativeImputer in Scikit-learn 0.22 that it'd be cool to see visualizations of. https://twitter.com/TedPetrou/status/1197150813707108352 https://scikit-learn.org/stable/modules/impute.html

A plugin model would be cool; though configuring the container every time wouldn't be fun. Some ideas about how we could create a desktop version of binderhub in order to launch REES-compatible environments on our own resources: https://github.com/westurner/nbhandler/issues/1