Hacker News new | ask | show | jobs
by hhuuggoo 4596 days ago
Part of the 0.4 release is to incorporate the concept of abstract rendering - which means you render on the server, and then send the necessary information over to the client on demand. For example, if someone tries to scatter a billion points, instead of just drawing a useless point cloud, you would figure out where all the points fit inside your 512x512 canvas (or whatever size you have), figure out how all the points stack up, compute an alpha that is meaningful for that number of points, and then send the heatmap to the client.

You can easily imagine as similar approach for line plots which does selectively downsampling of datapoints in order to preserve interesting features in the plot.

And then we'll build interactors on top of that, so you can actually treat it like a scatter plot, even though it's a heatmap that's being sent to your browser.

So the answer is - large datasets, means, as large as our abstract rendering algorithm can handle on your hardware, so those data sets should be pretty big.

3 comments

Interesting, this is for our second phase then (we're launching soon, you'll know about it). We'll definitely look into it if we can provide an interface for bokeh as well then. Currently we're transforming user provided sheets (csv etc.) into json and tying them into viz on client side. Thanks for answer.
I have a project involving multi-gigabyte datasets of line plot data. With your 0.4 release, will it be possible to show down-sampled subsets of these plots, with the ability to pan/zoom around and get more data on demand without having it all held in memory?
Well, we'll be able to do that without sending the data to the client, not sure if our implementation right now will work without loading the data into memory, though long term that is definitely the plan (we will leverage http://blaze.pydata.org/)

If you want to discuss further, please email bokeh@continuum.io

The Python version of Abstract Rendering currently would load it all into memory. The Java version is based on the same algorithms would not. It routinely handles multi-gigabyte files and lets us know that the core algorithm can scale. We're working on getting the Python implementation to scale as well.
For when is the 0.4 release planned? I would be really interested by this, having to visualize terabytes of data in the browser.
January - but probably only support for abstract rendering for scatter plots and line plots. we'll have to roll it out incrementally, but the good thing is 90% of plots are scatters and lines =)
Perfect, thanks.