Hacker News new | ask | show | jobs
by kannanvijayan 3397 days ago
I did sequence-based bioinformatics back around 2006 or so.

Very few of the operations used GPU. Things may have changed since I was working there, but the work at the time wasn't suited for a GPU architecture.

Initial step was sequence cleanup, which is a hidden markov model executed over a collection of sequences of varying length, so hard to parallelize. Sequence annotation is embarassingly parallel on a per-library basis (each sequence can be annotated independently of the other), but the computational work is fuzzy string matching, which is once again hard to GPU-ize. Another major computational job was contig assembly, which is somewhat parallelizable (pairwise sequence comparisons), but once again involves fuzzy string matching so not GPU-izable.

So that's just sequence genetics. Don't know if GPUs are used in other areas.

Lots of cores, lots of threads, and lots of main memory. That was the key.

1 comments

"Lots of cores, lots of threads, and lots of main memory. That was the key."

Very much this. Which is why I ended up theorycrafting that the AMD many core CPU's would be so useful.

And still is ;) Partly because some key work loads just did not run well on GPU's due to lack of addressable memory. Lots of amdahls getting in the way. Some of the key use cases required stupendous large memory machines (genome assembly using only short reads).

Then a lot of code is very branchy but massively parallel leading to clusters of pure CPUs to be more flexible, which is important in research settings, and with higher utilization than mixed CPU/GPU clusters.

GPU code takes longer to get to market and has more specialized skills required then standard CPU orientated programming. Late to market means you miss a whole wave of experimental methods from the lab. i.e. GPU short read aligners came when long reads started to come out of the sequencing lab. Leading to people to stop doing short reads or at least stop doing pure short reads.

Secondly quite a bunch of the key staff at the large research institutes had been burned by previous hardware acceleration attempts and where not going to throw money at it until market proven.

Bio-informatics tends to cutting edge (the hemorrhaging kind) on the bio/lab tech side yet the production IT tends to balance that to doing the things we know as we already have enough risks. i.e. focus on the algorithms and robustness not on pure power.

Hmmm, isn't deep learning starting to pick up for genetics? No idea if it is actually is, but everyone in DL seems to be talking about it, I thought I'd ask someone actually in bioinformatics :)
I wish I could say, I it's been a good two+ years since I left the genetics company, so I've been in a different industry for a while. I would say theres probably plenty of room, if people start taking more novel approaches that use more data, eg, full microbiome analysis. Also, I was just a sysadmin, so I don't really know anything other than keeping systems running, so take what I say with a grain of salt.
I suspect DL will have a limited to modest role in the actual search / alignment part, and a lot more to do with the analysis part. This includes medical diagnosis, identifying regulatory patterns based on high throughput expression data, such stuff.

Not necessarily in comp. genetics / sequencing.. / the DNA stuff..

Xeon Phi tried to crack this nut and seems to have mostly failed so far.