Rosalind: A genomics toolkit in Rust running whole-genome pipelines on a laptop | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Rosalind: A genomics toolkit in Rust running whole-genome pipelines on a laptop (github.com)
	185 points by samuell 28 days ago

18 comments

logannyeMD 23 days ago

Hey guys, this is my github repo. Glad it's received some interest - I figured HN might be the culprit when it suddenly jumped ~100 stars despite not working on the code base since last year. I prototyped this out of personal curiosity last year and moved on abruptly so there's a lot of gaps I still need to close and knobs that need to be optimized. But if people genuinely find "deterministic genomics workloads on edge devices" proposal useful, I'll begin refining the code tonight and try to make it as useful as possible. If you have any particular bioinformatics tasks or use cases that you want to be feasible on edge devices, lmk and I'll work on integrating new capabilities. Always happy to be helpful

croemer 23 days ago

Your website bio and LinkedIn don't match at all. Is the LinkedIn link on your website wrong? Update: yes it is. This is the correct one: https://www.linkedin.com/in/logan-nye

You're doing too much vibe coding and not enough checking/testing.

LinkedIn link on your website points to: https://linkedin.com/in/logannye

Website bio: https://www.logannye.io/about

woodrowbarlow 22 days ago

they weren't expecting to receive attention out of the blue today. it seems rude to attack someone's engineering skills because an online profile is out of date.

Tuna-Fish 22 days ago

It's not out of date, it's pointing at the wrong person.

whateveracct 22 days ago

this isn't an attack. this is a data point that they don't review the slop coming out of the LLLM

whateveracct 23 days ago

oh wow lol never seen that one before

a_bonobo 23 days ago

There has been a bit of a 'trend' to rewrite common bioinformatics/comp-bio into faster languages (Rust) via LLMs, OP's repo seems to be an early example.

Seqera Labs has a bit of a manifesto: https://rewrites.bio/

Heng Li has an overview here too: https://lh3.github.io/2026/04/17/the-ai-rewrite-dilemma

IMHO it's... OK? Bioinformatics code quality is generally poor, untrained biologists writing functioning code that is poor in scoping, but works. (Unguided) LLMs write on that level, too, so not much harm done.

ahartman00 22 days ago

How well tested would you say these libraries are? It doesn't sound promising, sadly. If there are comprehensive test suites, that would go a long way to ensuring new, faster tools arent producing subtly wrong answers. That's a pretty big deal, just because the code compiles or there is no exception thrown doesnt mean the analysis was correct.

Gethsemane 22 days ago

It's very context-dependent - the seqera rewrites so far seem to be pretty reliable, most of the work was spent merging the functions of multiple data QC tools into a single program (previously, there was a lot of redundancy that wasted compute). The success of other rewrites that I've seen tends to depend on the author's care/experience and usefulness. In my experience, bioinformaticians are fairly slow on the uptake of new software which might actually be an advantage here :-)

In defense of a lot of these bioinformatics-specific rewrites, there are some really dodgy coding practices and bugs that exist in well used tools, so there is scope for genuine improvement. The most recent release of minimap2 fixed some bugs identified in a rewrite, for example: https://github.com/lh3/minimap2/releases/tag/v2.31

mriet 23 days ago

Realistically, without data from a large testset that compares this thoroughly to Samtools (and others?), I wouldn't touch this.

Note to the OP: specify a focus please? short, long, mega-long read and bacterial, human, small plant or large plant genome? Alignment heuristics and performance differ significantly across those axes.

devlovstad 22 days ago

I work with genomics pipelines in my day job. This repo does not seem quite ready for serious usage until a comparison is made with existing tools such as Bowtie 2/samtools/Strelka or similar. For cancer genomes, it's also a bit limiting that it does not call structural variants instead of just SNVs/indels.

croemer 23 days ago

Those are all the tests for alignment. They don't even check the alignment is correct. Just that there are no errors. This is a joke: https://github.com/logannye/rosalind/blob/main/tests/alignme...

Looks like total slop to me. All code in one commit, then a bunch of commits polishing the Readme.

No release, no updates in half a year.

vatsachak 23 days ago

Looking at the commenting pattern, it seems like AI unfortunately

jghn 23 days ago

The OP? They're not AI, they've been active on X and bsky for years.

vatsachak 23 days ago

Sorry, I meant the code in the repo

samuell 22 days ago

I shared this since it seems to address a somewhat similar niche that I have had hopes to one day develop, based on FlowBase [1]; A library of streaming processing components based on basic operations, that can be easily stitched together into larger pipelines in a compiled language that can run on smaller hardware too.

FlowBase or I didn't have much of ideas about how to keep data structures compact, as the linked library does, and I was mostly aiming to make it really easy to build streaming pipelines.

I haven't yet got my head around how the composability story is in rosalind though, so would be interested in any pointers or examples on how this would be done using it.

[1] https://github.com/flowbase/flowbase

boron1006 23 days ago

Lots of bad smells in this repo.

the__alchemist 23 days ago

Do you have some examples to look at? I am curious.

boron1006 23 days ago

Well the √t stuff looks like nonsense or way overblown, existing tools already do similar things, there’s pretty much a single commit with no follow up commits etc etc.

aeve890 22 days ago

O(√t) looks weird but it's real. the "naive trial division" primality test for example.

boron1006 22 days ago

It doesn’t apply to what this repo is doing. Also the 70 odd single author preprints seems to suggest the author is in some deep AI psychosis: https://www.researchgate.net/profile/Logan-Nye-2

aeve890 22 days ago

Oh wow you weren't joking.

semiinfinitely 23 days ago

bioinformaticians have been making these useless bioinformatic-toolkit-in-my-favorite-programming-language repos for years

maxall4 23 days ago

Well, what else are we going to do while waiting for the bench scientists to finish collecting data?

asdff 23 days ago

Dissertationware is common in a lot of fields, honestly.

gilleain 23 days ago

Hate to agree, but it is true. For a while, I think, the main sequencing framework was in perl (Bioperl). Not sure what was best for structures - possibly Biojava?

It is very tempting, though - 'just' make a nice, clean API in your favourite language (eg Haskell, Ruby, ...) and everyone will flock to use it! Maybe.

alice-fishr 22 days ago

Why don't you mention Biopython? Bioperl is already too old and not much up-to-date with newest data.

flobosg 22 days ago

He’s talking about the past (“For a while, …”). Up to early 2010s, I would say.

peterfirefly 23 days ago

Should have called it Raymond.

flobosg 23 days ago

Or rather Margaret: https://en.wikipedia.org/wiki/Margaret_Oakley_Dayhoff

cmpb 23 days ago

I'm not familiar with Margaret Oakley Dayhoff, but I am aware that Rosalind Franklin [1] was extremely important for our understanding of DNA, comparable to Watson/Crick, with whom she co-discovered the structure of DNA. So it seems "Rosalind" is at least very appropriate as a name for a genomics tool such as this.

Not to say the other names mentioned aren't also deserving of similar honors

[1] https://en.wikipedia.org/wiki/Rosalind_Franklin

philipallstar 23 days ago

Rosalind Franklin was the team lead of the research team that photographed DNA.

The actual team member that took the key photo[0] was Raymond Gosling.

That team didn't interpret the double helix structure of DNA that the photograph had captured - that was Watson and Crick working it out from the photograph.

[0] https://en.wikipedia.org/wiki/Photo_51

groby_b 23 days ago

It's not quite that clear-cut. Franklin was pretty clear on the helical structure in both research notes and papers, but she didn't quite nail the overall structure (2 strands with opposing winding, complementing bases).

Fundamentally, she suffered the curse of the experimental scientist - waiting for actual data before being willing to build a model. Watson & Crick postulated ahead based on partial data.

dnautics 22 days ago

> Franklin was pretty clear on the helical structure

the type of diffraction her lab was doing only makes sense on helical structures. it being helical was already kind of? established -- linus pauling was contemporaneously working on some sort of alpha-helix inspired single helix model.

watson and crick immediately recognized the position of the diffraction spots fit the distances suggested by their chemical modeling of a, t, c, g, which franklin was not able to do since she hadn't made a structural prediction.

> postulated ahead based on partial data

not quite. if you know that a t c and g are the raw chemicals made, you can make a (possibly even literal) model and say, "this ball and stick model predicts diffractions here".

this is arguably better science than waiting for data and fitting a model to the data, falsifiability and all that.

samuell 22 days ago

> So it seems "Rosalind" is at least very appropriate as a name for a genomics tool such as this.

Indeed. The only argument against it might be that Rosalind is already a pretty well-known website for doing bioinformatics exercises and have them automatically graded:

https://rosalind.info

flobosg 23 days ago

> I'm not familiar with Margaret Oakley Dayhoff

Then you’re one of today’s lucky 10,000. Any time!

danborn26 22 days ago

Rust is a great fit for genomics. Processing whole genomes locally on a laptop is a huge step up from typical Python pipelines.

Rijanhastwoears 23 days ago

> A deterministic genomics engine with a compact memory footprint.

Uhh... are there stochastic genomics pipelines?

flobosg 23 days ago

A quick search gave me for example this one: https://genome.cshlp.org/content/26/1/36

pazimzadeh 22 days ago

i think these are more relevant examples

https://www.cs.cmu.edu/~ckingsf/software/sailfish/

https://www.nature.com/articles/nmeth.4197

mfld 22 days ago

I guess the author refers to the fact that many well-known tools have some randomness built-in. The most obvious one is differences due to the order of parallel processing. But these differences are often small and have no significant downstream effects. They are mostly inconvenient for regression testing.

vfalbor 22 days ago

Have you tested with other similar softwares such as Blast, which is the most common?

shauniel 23 days ago

I would love to hear about what the sacrifices are, but this project really looks amazing.

bonsai_spool 23 days ago

Didn't see a publication or preprint for this - is there one?

Jerry2 22 days ago

Awesome piece of software! Quick side question... does anyone have a recommendation for a DNA genotyping service that prioritizes privacy? I'm looking for a company that provides private results and doesn't add them to any sort of database (dystopian or otherwise). I'd love to get my DNA profile, but I'm concerned about privacy issues. :\

Gethsemane 22 days ago

Ultimately you're not going to find a service that can guarantee privacy, but your best bet might be to extract DNA at home (though tricky without a centrifuge etc...) and submit it to a standard sequencing provider novogene, plasmidsaurus etc. Realistically, they'll hold onto the data for a couple of months as part of the order, then delete it to clear up space. A bunch of discordant sets of DNA sequence without metadata isn't exactly useful for nefarious purposes! I wouldn't recommend sequencing at home unless you are very enthusiastic...

Jerry2 21 days ago

Thank you so much!

samuell 22 days ago

You might try sequencing your DNA at home :)

https://iwantosequencemygenomeathome.com/

(Well, the guy offers to do it for you too, delivering the data on an USB stick).

Jerry2 21 days ago

Thanks!

penciltwirler 22 days ago

blatant copyright infringement of https://rosalind.info/problems/locations/

p4ul 23 days ago

This is interesting; thanks for sharing! I have been curious about the adoption of Rust in computational biology. I know that the folks at Saint Jude's [1] are also using Rust for their 'omics research.

[1] https://github.com/stjude-rust-labs

croemer 23 days ago

We rewrote Nextclade in Rust and are very happy. Works nicely both for CLI and client side browser with wasm.

https://github.com/nextstrain/nextclade

samuell 22 days ago

Yeah, there is actually a pretty big shift towards Rust in the comp bio / bioinformatics community.

Nature even wrote a feature article about it a couple years ago:

Why scientists are turning to Rust

https://www.nature.com/articles/d41586-020-03382-2

They mention the Rust-Bio [1] project by well known Snakemake author Johannes Köster & co, and there are some other widely used libraries like needletail [2] and noodles [3].

A cool smaller tool developed by performance wiz Ragnar Groot Koerkamp which was just published is Sassy [4] [5]. He has also been involved in developing some high performance SIMD based stuff (minimizers) [6].

[1] https://github.com/rust-bio/rust-bio

[2] https://github.com/onecodex/needletail

[3] https://github.com/zaeleus/noodles

[4] https://github.com/RagnarGrootKoerkamp/sassy

[5] https://academic.oup.com/bioinformatics/article/42/5/btag244...

[6] https://github.com/rust-seq/simd-minimizers

shpongled 23 days ago

There is a relatively widely adopted tool (100+ citations, >500k invocations collected via telemetry) for mass spectrometry-based proteomics written in Rust, and quite a few others in the works.

[1] https://github.com/lazear/sage

the__alchemist 23 days ago

I'm building a structural bio crate system in rust (na_seq, bio_files, bio_apis, dynamics and some more specialized). No one is using it AFAIK other than myself. I am using it to build a GUI multi-purpose structural bio GUI program called Molchanica.

Note that this doesn't have much overlap with the traditional bioinformatics workflows like the OP (Rosland), or the one you linked to seem to be focused on.

clmcleod 23 days ago

Thanks for the shout out!

p4ul 23 days ago

Oh, thank you, @clmcleod! We've been following all your work closely in my team!

I'm very bullish on the long-term prospects of Rust in computational biology—as well as research computing more generally.