Hacker News new | ask | show | jobs
by dvdt 2263 days ago
I'm the co-founder and CTO at BillionToOne. I'm happy to answer any questions here. I've also posted a slightly more technical explanation of how the test works and why it can scale here: https://twitter.com/dtsao/status/1247642005510873088?s=21

Edit: Since our site seems to be overwhelmed at the moment, here's a recap:

We’ve been working hard at BillionToOne on a new COVID-19 test that scales testing to everyone in the US. Our test (1) re-purposes existing infrastructure, (2) eliminates time-consuming RNA extraction, and (3) enables a distributed system for COVID-19 testing.

We need 1 million tests per day to end the stay-at-home orders. Schools are still open in Iceland because they test 15x more than the US does, per capita (https://www.washingtonpost.com/world/2020/04/02/free-coronav...).

The first thing we figured out is how to run COVID-19 tests on existing automated Sanger sequencers. One sequencer can process up to 3840 samples per day. There are hundreds of sequencers of excess capacity because they were built for the Human Genome Project over 20 years ago.

It would take only 2 sequencers to surpass the current test capacity for all of California. There are far more than 2 sequencers in California (some individual labs have 10 or more).

We tweaked the protocol so COVID-19 could be detected from sequencing data using linear regression. Basically, we add ~100 copies of a known DNA sequence to help us calculate how much virus nucleic acid is in the specimen. It works just as well as gold-standard RT-qPCR.

Lab workflow for COVID-19 testing is traditionally 1. Specimen accessioning, 2. RNA extraction, 3. RT-qPCR 4. Reporting. RNA extraction, in particular, has been a huge bottleneck in terms of reagent shortages and labor-intensiveness.

We showed that we can skip RNA extraction entirely without affecting test sensitivity and limit of detection.

By skipping RNA extraction and using automated Sanger sequencers, we think we can get to an additional 200,000 samples per day test capacity in existing clinical labs.

A distributed system is often the only way to operate at massive scale. A fully distributed system could have different sites and labs responsible for each process and dynamically re-allocate resources based on availability and capacity.

The Broad institute COVID-19 lab has already started doing this. They are asking for specimens to be submitted in a standardized tube format and pre-barcoded. They have essentially distributed the specimen accessioning work.

Because there is a highly developed service industry for Sanger sequencing with <24 hour turnaround, there is an opportunity to further scale up testing by distributing the work to their (currently) idle sequencers.

Distributed testing could scale from 200k to >1 million tests per day, but would require a change in regulations that currently prohibit it.

Thanks to the BillionToOne team for pulling this work together! Next step is to start manufacturing test kits and obtain Emergency Use Authorization from the FDA. We’re eager to work with clinical Lab Directors and contract kit manufacturers.

Edit 2: Link to scientific manuscript: https://www.dropbox.com/s/07esyehsvfpmllc/A%20Highly%20Scala...

22 comments

Hey, lowly Bio undergrad here, but how are you able to skip the RNA extraction step? I read the paper and you use the viral transport medium, but wouldn't you have to also purify RNA from that (or is it just much easier to extract RNA from that medium)? I also dived into the paper behind the "skip the RNA extraction step" methodology and it basically seems to swap out one RNA extraction kit for another (Qiagen RNeasy Mini kit and the Qiagen RNeasy Micro kit). Couldn't shifting kits from one provider to another introduce supply chain strain? (or am I just oversimplifying it?)
Thanks for the question! The goal of skipping RNA extraction is to decrease the amount of labor necessary for processing samples and also to eliminate a dependency on RNA extraction reagents that have recently become difficult to find. The FDA is very strict about the specific brand and model of kit you use, so showing that you can swap out one RNA kit for another is actually very useful because you will have alleviated some of the supply chain strain (although I agree at high enough load both supply chains will then become limiting).

The way currently available COVID-19 testing works is by detection of viral RNA. Since the amount of viral RNA in a patient sample is too low to detect directly, we first need to amplify it by PCR. However, this viral RNA is packaged within all sorts of proteins and lipids that could make it inaccessible to amplification unless they are first purified away. Furthermore, the sample is shipped in "viral transport medium", which is essentially a cocktail of chemicals designed to preserve the virus. Unfortunately, these preservatives often have the side effect of interfering with PCR amplification, so these too need to be purified from the sample.

However, since RNA extraction is usually the most laborious part of the assay, there has been a lot of interest in optimizing the amplification so that it is resilient to all of these impurities. The preprint referenced in our manuscript (https://www.biorxiv.org/content/10.1101/2020.03.20.001008v1) gave us the initial idea that this could be possible, and much of it comes down to the choice of amplification method (e.g. choice of enzymes and buffers) that you choose.

However, even when you choose a "good" enzyme and buffer, you will still suffer an amplification penalty, and this will cause you to return a false-negative on some affected samples because there was so little virus in the sample to begin with. The innovation we have is to spike-in a correspondingly low level of DNA to the reaction mixture. That way, if you see the low level of DNA without seeing any viral signal, you can be assured that the amplification still worked and that there truly is no virus in the sample.

In the UK they're saying there's a shortage of swabs and pipettes even, do you not need these too?

Also, in the UK our independent and uni labs have been saying for almost a week they could extract the RNA differently but the NHS have a fixed approved way that they won't change.

- Are the chemicals you're using more common or would there just be a new shortage of different chemicals?

- Is there a risk you'd be creating a test that didn't work very well, and the US would end up with a bunch of useless tests (e.g. Italy had to abandon a bunch of Chinese sourced tests, UK's anti-body tests are ineffective)?

Our technique would still be affected by shortages in specimen collection (like swabs).

Purely speculative, but I think if swabs remain an issue for too long, alternatives could start coming online, such as even using qtips + saline (no idea if it works, it's just an example). The current swab + Universal / Viral Transport Medium combo is optimized for flexibility; it is designed to work across a very broad range of viruses and bacteria that have different viral loads and shedding characteristics. The current pandemic is pretty much COVID-19 only, so I think it's a priori feasible that a specimen collection procedure can be found that uses common materials. We did try early on to see if saline or other buffers affected the performance of the assay, and it worked fine in those conditions.

We use fairly standard chemicals. I haven't heard from our suppliers about shortages for the chemicals we use. Chemicals and enzymes tend to be relatively fast to scale up for bulk manufacturing.

There's always manufacturing risk that a product will not work as expected. In fact, the first COVID-19 test developed by the CDC did not work as expected, and this delayed testing by several weeks. We de-risk this as much as possible by performing experiments as early as possible, akin to the fail fast mentality of checking for the highest risk failure modes first. Since we don't have a national healthcare system in the US, the manufacturer takes on the vast majority of the risk of a defective product.

There are companies out there working on swabs. e.g. Formlabs designed an autoclavable, 3D-printed nasopharyngeal swab using biocompatible Dental resin, in concert with local hospitals. They received FDA Class 1 Exempt status from the CDC and are printing some 150K per day.

https://formlabs.com/covid-19-response/covid-test-swabs/

https://www.plasticstoday.com/medical/formlabs-3d-printing-m...

https://formlabs.com/covid-19-response/

I'm not associated with the company, I just own some of their printers. They've also got some 2000+ volunteers who own printers or have CAD expertise signed up and looking for ways to contribute. Apparently we can't make medically-approved swabs (most of us aren't ISO 13485-certified or FDA-registered), but there's other stuff (e.g. hands-free doorknobs). I'm even contemplating shipping one of my printers back to them to help in the effort.

if the virus is known to live on cardboard or plastic for 48-72 hours, is the viral transport medium even necessary, assuming rapid shipping and processing?
Can live up to X hours != Will live up to X hours

Let’s say it’s 50/50 whether it lives 24h without help. That’s would be a pretty bad false negative rate for your test, but a 50/50 of potentially getting infected by your mail is pretty high.

The spike-in DNA injection is clever. How did you develop that technology? It seems like the kind of thing that just hits you one day.
It's standard practise in lots of kinds of sequencing experiments to use a spike-in. Makes perfect sense to use it here - in fact all the other sequencing based SARS-CoV-2 tests I've seen also use spike-ins.
>the amount of viral RNA in a patient sample is too low to detect directly,

How many tests have you run on patient samples?

Thank you so much for that explanation!
https://en.wikipedia.org/wiki/DNA_spiking -- this seems like an old idea, no?
We have the data both with extraction and without and show that it does not make a difference with qSanger. In Figure 4, we add VTM directly to PCR reactions. Seracare VTM samples has SARS-CoV-2 viral RNA in a different capsid to prevent infections in a research setting, but otherwise it reflects real-world VTM samples (and much more realistic than even what EUA requires).

By the way, this robustness is completely expected, as any impurities in VTM would impact spike-in and endogenous viral amplification equally for end-point PCR (so their ratio stays the. same). This is not necessarily true for qPCR where an impurity (caused by lack of RNA extraction) can potentially cause a positive sample look like a negative when the viral RNA does not RT-PCR.

The RNA extraction can also be done without reagents by using a heat reaction on the sample similar to boiling an egg. It is a 5-minute process.

According to an older scientist, Anders Fomsgaard at the Danish Serum Institute, this is how they did it ”in the old days”. He is the father of one of the authors.

This eliminates supply chain problems for reagents and was shared quickly to help in Spain.

Preprint by Fomsgaard and Rosenstierne:

https://www.medrxiv.org/content/10.1101/2020.03.27.20044495v...

This is amazing.
Hi, I read through your paper. Interesting method.

In figure 1A, the workflow includes a standard PCR step before Sanger. Workflow-wise, wouldn't it still have the same bottleneck as qPCR test, i.e. limited by 96/384-well instrument runs?

Thanks; that is a good question. In our laboratory, we have a ratio of 10:1 for PCR to qPCR instruments. In the new laboratory that we are constructing, the ratio is 50:1. It was similar at Stanford academic laboratories during my PhD. Standard PCR instruments are inexpensive and very common. qPCR instruments are definitely not as common, as they are very specialized instruments for a few use-cases.

Most clinical laboratories would have 10 to 50 PCR instruments that they can use to run the initial amplification reaction in parallel before Sanger sequencing. Also, Sanger sequencing uses a plate feeder, so you can add new plates on top as the second round of PCR reactions finish.

But, more importantly, the qSanger can by-pass RNA extraction, which seems to be an important bottleneck in the RT-qPCR workflow.

It's also possible to bypass RNA extraction and go directly into RT-qPCR.

"DIRECT RT-qPCR DETECTION OF SARS-CoV-2 RNA FROM PATIENT NASOPHARYNGEAL SWABS WITHOUT AN RNA EXTRACTION STEP" https://www.biorxiv.org/content/10.1101/2020.03.20.001008v2

Thank you for what seems to be a breakthrough with implications beyond even the current outbreak.

I had a few questions though:

How do you compare this test to the Abbott machines? Obviously that test is faster, but how does that impact what we can do with it?

For 1m/day to be sufficient, do we need contact tracing programs to be able to find everybody who needs to be tested? How hard will it be to scale these programs?

The Abbott machines are point-of-care devices that typically sit in doctor's offices. One really interesting use case I've heard of for the Abbott machines is to test all OB patients who are coming in to the clinic for routine care to make sure that they are COVID-19 negative. This allows the clinical staff to conserve PPE and use less burdensome precautions.

I think that where the Abbott machines might hit a wall is that they are one at a time, and they require Abbott's consumable test cartridge and device to run (think printer ink / printer). I don't have any firsthand knowledge, but I would anticipate that it is difficult to scale-up manufacturing of the devices rapidly enough to keep pace with the pandemic growth.

We absolutely need contact tracing to find everybody who needs to be tested. We're not working on scaling up contact tracing, but I think several people in the tech community are working on making that easier to perform at scale.

Abott's test is cartridge based, isothermal and modular. There should be no technical reason why they cannot build a high throughput, random access version.

Whether this is the direction the company wants to spend their resources is another story.

Your company has filed a US patent application (and it appears also a PCT) on the qSanger method, and this application is reference 8 in your document.

What are your plans for licensing the technology?

If it is specific to COVID-19 testing, we will not seek anything, as long as the end-user is not financially benefiting from it or importantly, selling qSanger kits.

If they need our bioinformatics automation & help with set-up, we would license the method for COVID-19 testing for $3-$5/sample as part of each sample that is being put through our pipeline.

If they ask for 96-well plates with all reagents that are ready to use (so that they just need to add VTM), we would work with manufacturers to produce the reactions and plates, and the price of kit (~$15 per test) would include limited license to use our automated bioinformatics calling pipeline.

When you say 'as long as the end user is not financially benefitting' - is the end user the lab conducting the test?

You said in an earlier comment that the reimbursement for testing is too low to justify buying expensive equipment. You are also proposing to charge half the reimbursed rate for it to run on someone else's equipment.

Are the current equipment owners expected to donate this crucial equipment, because if they are the bottleneck, shouldn't they be the ones compensated to encourage more equipment to be made available?

$15 is half the price of even the bare minimum qPCR kits (e.g., TaqPath). We need to buy the reagents from NEB, IDT, and others and work with a contract manufacturer to mix it into a reaction. Reagent, manufacturing, quality control, and fulfillment cost already add up to ~$11/reaction. That does not take into account any costs associated with developing the assay, supporting the assay, getting it through EUA, customer service, bioinformatics help. And we have to pre-pay for all of the reagent costs in the anticipation of the volume. I anticipate that we will likely end up net negative with this work, and even if it ends up being slightly net positive, it will not impact our valuation in a positive way.

The current equipment owners are already the clinical laboratories. It is unused capacity for them. Other owners are sequencing service providers. The full cost of running an end-to-end Sanger reaction as a provided service is $2-$6, so at the $50 reimbursement price, the laboratories will still be incentivized.

You and your co founder(s) are good people. I'm glad you're in a position to do this. If this works, you'll save lives.
This equipment isn't something a hospital has unless they have a serious desire to do top notch genetic disorder testing, and that kind of hospital is going to use the equipment for this cause. The Abbot machine is something practices with much harder financial constraints have to seriously worry about paying for.
Your company is an inspiration and are doing a tremendous service to the world. I am a federal consultant and have a couple of companies in the medical space that I help find and write solicitations for. I'd love to help your company free of charge on anything and everything. I have alot of infrastructure built such as pricing tools, automated solicitation writing, a ton of outreach lists, etc. One such outreach list is of clinical labratory's nationwide with emails to executives and a ton of other information. I also work with a company that sells reagents and contract kits. I've also worked at Tesla on the Supply Chain, Capital Equipment team in the past and can help you procure any reagents or automated Sanger sequencers/ labs who have them. Additionally, I am fairly adept at SQL and Python (Pandas for data) and why I am a regular on Hacker News and how I found the comment. Let me know if there is anything I can do as I would be very proud to contribute anything to your mission.
Why not use pooling? You can test 64 samples at once and use a binary search if the pool is positive.

See: https://www.medrxiv.org/content/10.1101/2020.03.26.20039438v...

Hi David, what about Rosche Cobas 6600/8800 high-throughput diagnostics machines? Are they using the traditional RT-qPCR assay? How can they are able to test 400k samples per week?

https://diagnostics.roche.com/us/en/news-listing/2020/roche-...

Roche 6600/8800 instruments are state-of-the-art automated RT-qPCR machines. We need them as well as other COVID-19 testing instruments.

That said, it is easier to ship the test kits than scale the instrumentation. Both Roche and Abbott still need to build hundreds of their instruments before the kits that they are shipping out this week can be used on the daily rate that they are trying to get to. I am not sure with Roche, but Abbott estimates the end of June to have enough machines shipped to achieve 50K per day capacity on their instruments.

Another potential problem with new instrumentation is that reimbursement for COVID-19 tests is very low ($30-$50), so it becomes financially difficult for hospitals and laboratories to buy very expensive instruments and also pay for test kits that cost $30-$50 per test, on par with reimbursement.

We try to avoid both issues by utilizing a currently unused Sanger install base and low-cost reagents.

> We try to avoid both issues by utilizing a currently unused Sanger install base and low-cost reagents.

Good thought! Thank you.

Additional question: are Sanger instruments also very sophisticated and manufactured by those limited few pharmaceutical companies/medical equipment manufacturers? Or is there a need or necessary to scale up the production for Sanger instruments too? I am asking that is because lack of testing is a global issue, not only in the U.S. Africa/India together has 2 billion people are the testing hurdles there are even more challenging.

Those systems process up to 384 or 1056 samples in ~8 hours. The proposed method should support a similar volume (even a bit higher) than the Roche 8800, but with a different large installed instrument base. So this should add a lot of capacity to that qPCR.
Thanks for the link to the scientific manuscript. Can you speak more about the machine learning aspect? Best of luck on getting FDA approval.
"When you’re fundraising, it’s AI When you’re hiring, it’s ML When you’re implementing, it’s linear regression"

The core of our machine learning is Ax=b :grins:

More seriously, the main reason why traditional sanger sequencing can't be used for COVID-19 testing is because it would be unclear whether a lack of signal is truly due to lack of virus, or if it is just because the assay failed (happens all the time!)

What we've done is introduce a reference sequencing signal that is biochemically very similar to viral RNA, but produces a distinct vector of electrical signals that is different from the signals emitted by viral RNA. Since we know what both the reference and viral signals look like, we can perform linear regression analysis to fit the linear combination of viral and reference signals that best match our data.

> Since we know what both the reference and viral signals look like, we can perform linear regression analysis to fit the linear combination of viral and reference signals that best match our data.

Does this mean you assume a linear relationship between the quantity of viral RNA and the strength of the signal?

I know that when back i used to draw calibration curves from my positive controls, there was usually a sublinear relationship, across all sorts of different assays, at least at the upper end.

For HN it's better to drop the fundraising language and use the implementing language, so I've s/machine learning/linear regression/'d your text above.
~~Please change it back. I appreciate ~dvdt's humorous yet candid explanation that they're using the simplest of machine learning techniques, but unless they explicitly say that it is indeed simple linear regression, I think it risks inaccuracy to describe it so specifically.~~

ETA: never mind, ~dvdt signed off on the change. Appreciate the collaboration!

Wait, are you saying that you actually edited the text of someone's comment?
I do that occasionally when coaching YC startups with their HN launches (https://news.ycombinator.com/launches), which this is a variant of.
So, you have explicit permission to do this ahead of time, from the user?
Wow...
Seems reasonable, as dang explained the impact and the rationale.

Thanks Dang!

Thanks!
They get basically two superimposed chromatograms from Sanger sequencing, one from the control and one from the patient sample. They need to sort out which chromatogram peak belongs to which and calculate the relative abundance of the sequences in the sample from the relative peak intensities.

The machine learning angle isn't the exciting part here, it's all the rest. Great idea!

Does India have these machines? Can we easily replicate the tests here?

We are approaching our peak in 3-4 weeks and with current testing capacity totally fucked.

I know many people in Government and on official Covid response team to get the ball rolling if this is a real possibility.

I have no idea, but at least it should reduce pressure on reagents required for other techniques.
Do your tests distinguish between active (infectious) SARS-CoV-2 DNA and remnants that aren't infectious by themselves?

Do they detect antibodies?

What is the false negative rate for nasal swabs vs anal swabs?

Hello! Your website says:

“Most qPCR instruments have low throughput (they can run 1- 48 samples at a time) and they all compete for the same reagents.”

Is it more accurate to say that FDA-approved qPCR machines can run up to 48 samples at a time?

Most qPCR machines can handle 96 samples at a time, and there are versions that go as high as 384 at a time.

The FDA has only approved qPCR machines that go up to 48, correct? Do you know why they’re not running on the more parallel machines?

Thanks!

What are the regulatory issues involved in reaching 1M/day?
Do these tests have an advantage in specificity and sensitivity, especially if doing test pooling?
I read the paper and I'm still a bit unsure about how the process works. How specific is it to COVID-19?

Given that a Sanger sequences does sequence all kinds of mRNA, would you also find other RNA-based viruses like the flu? How much modification would be necessary to diagnose all kinds of RNA viruses?

It sounds like your work has amazing potential. How far have you come already, and what are your biggest challenges moving forwards? How many tests did you manage to process today?
I would not have even considered trying to make use of the existing Sanger sequencing machines. This opens up a whole extra set of hardware to increase testing bandwidth. Thank you.
What's the plan for regulatory approval?
Pardon my ignorance, but does the test detect the actual virus or fragments that result from the infection?
Is there any way a random HN reader could help? I wish my country do more tests.
Hi, Any thoughts on maybe contacting other countries such as India?
Is answering questions on HN really a good use of your time right now?

Thank you for doing something about the pandemic. It's quite heroic.

All the material on your page talks about throughput but not turnaround time which makes me quite suspicious that it's not very good. Obviously, however, if it takes 2 weeks to do the test then it has very limited utility as the person will have recovered (or had an adverse outcome) anyway. Since it cannot be administered at the point of care I am imagining that you are already up against a day or so of logistics just to get a sample to the testing location.

Can you please comment on the actual realistic turnaround time for the test?

A turn around time of 24-36 hours should be easily achievable for the performing lab, depending on the time of day the specimen arrives (morning vs afternoon). It takes about 1 hour for the initial RT-PCR amplification, and sequencing takes about 12 hours.
Thanks - that's not too bad.

If people can sort out logistics well enough then it sounds like it could be extremely helpful for implementing broad scale testing.