| I’ll say upfront that I’m a Product Manager at Treasure Data, and we market to Data Scientists. Specifically to enable you to perform analysis on large datasets, directly from your local machine. More generally, Treasure Data enables the collection, storage & analytics of large-scale event data. For performing preliminary analytics, I’ll agree with what the previous respondents have said - iPython Notebook is a GREAT tool. It’s certainly my go to. The libraries I think of using when working within this context are as follows: The go-to packages:
> http://ggplot2.org/ (R)
> http://matplotlib.org/ (Python) Graph visualizations:
> http://gephi.github.io/
> http://neo4j.com/ Online dashboards:
> https://github.com/stitchfix/pyxley (<- I’m particularly excited to try this out)
> http://bokeh.pydata.org/ Of course, the challenge is you don’t have a static dataset! New data is continuously coming in. Your dataset is growling larger all the time. It may be too large to fit on your local machine. That’s why Treasure Data was founded, to enable the easy collection of, and analytics on, this type of data stream. Treasure enables complete removal of the engineering & devops for these collection & storage steps. For example:
> Want a continuously updated dashboard of your incoming data? = Treasure Data + Jupiter Notebooks + Pyxley
> Want to perform graph visualizations on event data? = Treasure Data + Jupiter Notebooks + Neo4J
> Want to create visualizations in R? = Treasure Data + R + ggplot The above is enabled through Treasure Data’s integration with Pandas & R. (http://docs.treasuredata.com/articles/jupyter-pandas). Good luck in your work! |
We have an open-source graph visualization JS tookit called linkurious.js: https://github.com/Linkurious/linkurious.js
We also offer a commercial product to visualize Neo4j stored graphs: https://linkurio.us/product/