Hacker News new | ask | show | jobs
by MooMooMilkParty 2541 days ago
You're blowing my mind here a little bit. I don't use MySQL but do use Excel and the R/Python data science stacks quite a bit. I regularly open data with tens or even hundreds of thousands of columns in the latter by using lazy computations. Is this not the case with database approaches? What kind of hardware are we talking? Presumably Excel has to load everything into memory at once since it's immediately viewable, right?
2 comments

In MySQL, maximum row size for non-text columns is 65,535 bytes. MySQL is limited to 4096 columns per table, and 1017 columns per table if the InnoDB engine is being used.

Excel's limits are 16384 columns and 1048576 rows.

I'm talking wimpy hardware here, I admit. Basically, VirtualBox VMs on a quad-core i5 box with SSD and 8GB RAM. With the VM having three cores and 6GB RAM. But it was the same wimpy hardware for Windows 7 and Ubuntu.

Oh, the day Excel got rid of the 64k line limit! Was a very sweet day!
Notwithstanding the row length and column count limits, did you also give MySQL more memory? Setting InnoDB buffer pool size to 3 or 4 GB might help; its default is about 134 MB.
I think so, but don't remember for sure.
> Presumably Excel has to load everything into memory at once since it's immediately viewable, right?

Depends on the file format. An XLS or XLSB file can contain special markers for where each logical row starts, so it can randomly access rows; Both also can persist "calculation chains, which are a simplified dependency graph. The binary formats also store formulae in a parsed representation allowing easy scans to see what cells have to be inspected if a file needs to be recalculated.

Aha! As far as persisting calculations, this makes sense. I guess in my head I was thinking "pure" data where everything opened was precomputed. Forgive my ignorance about database solutions, but do they not implement something similar?

But now you've got me thinking, it would be nice if libraries like Dask could allow for flagging of symbolic operations like this to be written to disk for quickly saving metadata where intermediate steps don't explicitly need to be saved.