Hacker News new | ask | show | jobs
by adev_ 2019 days ago
You got half of your answer in your sentence.

That's not only bioinformatic but the entire HPC world tends to avoid database.

They usually prefer HDF5 or similar, and there is reasons to that. It is much easier to scale one million node accessing a flat file over a DFS than it is over to a database.

2 comments

Also, in these fields with HDF5, you tend to write once read often. Bioinformatics and other HPC using researchers have totally different resource consumption than web services. 'Data' really means something completely different.
> Also, in these fields with HDF5, you tend to write once read often

Server oriented DBMS specialized in write-once-read-many workflow do exist.

However, you are right: research have completely different data consumption model than web service.

And in HPC, it is: - Much more efficient to do sub-milliseconds massive parallel data access over a parallel DFS, one network switch away than it is do it over a DBMS.

- Often much more convenient to move a flat file around to do analysis/model modifications of scientifics results on your laptop.

It is much easier to scale one million node accessing a flat file over a DFS than it is over to a database.

They are also much easier to distribute. I can just upload my arbitrarily large hdf5 file to your ftp server and you can just open it in matlab/jupyter and start playing around with it. Doing the same with a database (other than sqlite) is really hard and requires that our database versions align and you'll probably need help from someone from your IT dept. to get the right version installed and so on.