Hacker News new | ask | show | jobs
by nnmg 1762 days ago
Excel is used as a database/storage/interchange format, especially after the initial analysis by someone who uses python or R. Bioinformatician does the analysis, then the PI wants to see it so they can Ctrl-F for genes they are interested in, so out comes an excel document.

And really, even if you know python or R, are you really going to fire up a jupyter notebook, load the data, and run pandas queries every time someone in lab meeting or after a talk asks you about this gene or that gene in your data?

I think the important question is why is date conversion a default? Would it really break backwards compatibility for MS Excel users if date conversions were explicit instead of automatic? Turning that off by default would fix a lot of this.

3 comments

> Excel is used as a database/storage/interchange format, especially after the initial analysis by someone who uses python or R. Bioinformatician does the analysis

Sometimes, but the situation is in reality worse than that. Excel is also used as the gold standard database/storage/interchange format of record for random shit that clinical researchers have typed in by hand whether directly or transcribed from other notes, often when that data isn't actually fundamentally tabular in nature because people really like working in grids. Even when grids hurt more than help.

A big secret in genetic research is that the MDs, grad students, project managers, and coordinators running the research programs are often not super focused on what well-structured data looks like and don't know what things like "key-value store" or "nested tree-like structure" mean, and even if they did there aren't good GUI tools for entering them anyway, and it leads to countless errors that maybe (here I speculate) they just assume will wash out as noise.

> I think the important question is why is date conversion a default?

Yes, why any kind of conversion is ever the default is a real money question.

For the finance and business office worker, it seems to have traction. Just like auto-creating an emoji when you type a : character. Excel is for offices, not specializations of scientists. Bummer.
So maybe we need better software for scientists? Sounds like a hole in the market
The market for scientific software is a bit iffy. Scientific software also needs to be super super flexible since the users are, somewhat by definition, not doing something that's been done before. Hard market.
A good spreadsheet for scientists. That’s a lot of work for not much money. I don’t know that adapting LibreCalc would do the trick.
I don't work in bioinformatics, but what you are describing is a completely accurate description of what I experienced working in manufacturing quality control. Raw data came in from suppliers in the form of spreadsheets, and management wanted to see results in spreadsheets. Meaning all our quality data was subjected to these issues. The date formatting issue was a particularly annoying "gotcha", particularly when features were defined with a XX-XX numeric code. The number of times I had to deal with someone in a meeting saying "hey, why is this feature called October-13?!" Super frustrating.

If I could choose the tools used by the whole process involving multiple different companies and departments, hey I would! It would be python all the way down. But I was but a cog in a massive organization.

> A lot of "safety culture" is composed of things like checklists and hazard warnings which are more geared towards shifting the blame for accidents onto somebody else than actually preventing those accidents,

If you stay in spreadsheets these problems mostly don’t occur (that is, once data entry is squared away so that the initial spreadsheet has what you want it doesn't tend to get lost), its when you move in and out of spreadsheets via text and take the path of least resistance [0] to do the transition that the problem occurs.

[0] and to be fair, there is a lot of resistance off that path.

The process I had to deal with was filling out spreadsheets with data from a python-driven 3D inspection program that exported out data files in CSV format. Needless to say, these errors were inevitable for exactly the reasons you've stated. Why we didn't bypass the large, poorly formatted cumbersome spreadsheets and just directly export data via pandas? All the inspection was done by Python anyways. You tell me! Also, it did not help that the spreadsheets were not created by me, or any colleagues in my department.

God I hated working in old-school engineering/manufacturing. "That's not how we do things" is the answer to everything. I

Sorry about the misplaced quote. Meant to be a quote from the immediate upthread comment. Looking back, it probably wasn't even needed, the response works fine against the comment as a whole.
And really, even if you know python or R, are you really going to fire up a jupyter notebook, load the data, and run pandas queries every time someone in lab meeting or after a talk asks you about this gene or that gene in your data?

I don't do any scientific research, but I have been using jupyter as a replacement for excel since it was called the ipython notebook. I don't really use pandas all that often, I just find it easier to read and edit data in python. Though I first learned ipython added the notebook from a talk Wes McKinney gave about Pandas.