Hacker News new | ask | show | jobs
by epistasis 2592 days ago
That video describes the process used before NGS was around. These days, using anything with plasmids would be pretty unusual.

There are several next generation sequencing technologies:

1) short read - Illumina - dominates most next-generation sequencing 2) long read - nanopore or pacbio.

These have very different analysis methods, have measurement errors that are very different, and even have different file formats, etc.

Short read is far more common, so you're probably in the "Data Analysis" of this:

https://www.youtube.com/watch?v=fCd6B5HRaZ8

But you need to know about the adapters and indices (how multiple samples can be sequenced at the same time).

But as another commenter mentions, knowing some particulars about the project would really help know what sort of tutorial would be appropriate. You'll need to also know about the biology of the application, in addition to understanding the sequencing technology.

1 comments

the program will work on fastq files. The sequencing technology makes long reads.

As another commenter said, I don't need superdeep sequencing knowledge because my work will mostly be on the programming side (enhance performance, not adding new functionalities) but anyways it could be useful to have a clear picture of the process.

Thanks for your help

Unfortunately I don't have many long-read resources to share, but here's a short video about the process for the MinION nanopore sequencer for long reads:

https://www.youtube.com/watch?v=Wq35ZXyayuU

At about 1:30 there's a cartoon of the data signals that get processed into sequencing data.

It's been a while since I looked at long read data, but last time I did, the individual base calls in FASTQ files (A, C, G, T) have a fairly high error rate, and there are systematic biases in the errors, which makes it harder to correct them. Most of the processing of these data is trying to correct these errors, either by looking at a known reference sequence or by sequencing many times.