It allows you to use Altair in Python for visualising data, but does the computation in the backend using Arrow DataFusion. Not for 15GB perhaps, but cool nonetheless.
I have an excel template for handling a relatively large amount of data. No where 15GB on one sheet. I use it for preprocessing experimental data from a single experiment. There are about 10 chart tabs build in so I can visually inspect the data looking for errors (and go back and inspect the raw instrument data when something looks off).
The aggregate data is around 1.5 million experimental results. MiniTab is too unwieldy and requires too much manual reformatting of the data sheets.
Is this something I should be looking at in R or project Jupyter? Does one make better visualizations than the other?
Ggplot is extremely powerful if you can grok its grammar, which takes some getting used to. But I'd assume that if you see a graph in a scientific paper it's made with ggplot.
Having many data points you want to explore you are always going to be at the edges of what your hardware and software can produce.
The last really big datasets I worked with were for my thesis and I had to do subsampling to below 10% to get results within 10minutes or so and that was basically plotting midi recordings of piano performances, so nothing gigantic
It allows you to use Altair in Python for visualising data, but does the computation in the backend using Arrow DataFusion. Not for 15GB perhaps, but cool nonetheless.