More

ampdepolymerase · on Dec 26, 2021

A used laboratory grade NGS system can be had for less than 10K

https://www.ebay.com/itm/265148387179

Nanopore is still not quite ready yet for precise and high accuracy sequencing. Give it another five years.

joshuamcginnis · on Dec 26, 2021

That's not true. I just did a high-quality sequence and assembly of a new species of fungus from my home lab using nanopore. You can see all my code used for assembly and analysis that will be referenced in a paper I plan to publish in Jan here: https://github.com/EverymanBio/pestalotiopsis

lawrenceyan · on Dec 27, 2021

Given that the decoder is machine-learned and depends on a training set to go from squiggle -> ATGC..., how do you ensure that sequences which haven't been seen before (not in the training set) are still accurately accounted for?

joshuamcginnis · on Dec 28, 2021

We used Guppy for basecalling, which is neural network based and used to turn raw signal data into the predicted bases. There're no guarantees of accuracy, only tools to determine and assess quality. One major way of assessing accuracy is to compare the subject genome with other similar reference genomes and denote the high-degree of homology in highly-conserved regions.

lawrenceyan · on Jan 2, 2022

My question is if in the future, we would be able to fully rely on translations to predicted bases for sequencing or if there would always be a need to compare with a different sequencing methodology in the case of de novo genetic information that previously hasn't been seen before (no reference genomes being available in that case).

Is there publicly available information on how accurate Guppy is, as well as how the amount of training data scales with improvements in accuracy?

It didn't seem like these things were mentioned explicitly in the Community Update, other than that it’s expected to continue improving, but a clearer roadmap would definitely be much more helpful.

pas · on Dec 27, 2021

How do you know the quality of the resulting sequence?

joshuamcginnis · on Dec 27, 2021

There are quality checks throughout the entire process, starting from the raw read quality scores returned directly from the sequencer all the way to fully assembled genome completeness. In our paper, one of the tools we used for this is called BUSCO[0] which scored our assembly at 97.9%, a relatively high score for de novo assemblies.

[0] https://pubmed.ncbi.nlm.nih.gov/31020564/

ampdepolymerase · on Dec 27, 2021

You don't, not without either resequencing it with another sequencing system or benchmarking the sequencer with a known sequence.

jcims · on Dec 27, 2021

I thought I recognized your name from the side hustle story. :) This is super cool!!!

joshuamcginnis · on Dec 28, 2021

Thanks man!

AstroDogCatcher · on Dec 26, 2021

Interested outsider here; I work with a lot of HCLS research customers but don't have a biology-related background. Can you explain the problems with the Nanopore sequencer accuracy in more detail? Basically, I was wondering if I could get one for myself and sequence my own genome, then user the data to learn about life-sciences computing techniques. If I were to buy one of the USB-attachable devices and run it, is the data simply not viable for use in a genomics pipeline, or is it just that the results would be questionable? Also, if accuracy is an issue, what about just running the same sample N times and doing some error correction?

ampdepolymerase · on Dec 26, 2021

I recommend reading this review

https://genomebiology.biomedcentral.com/articles/10.1186/s13...

I guess there are limits to ensemble methods if the underlying accuracy doesn't increase. I don't work on gene sequencing algorithms but from what I understand of ML ensemble techniques, there are certain assumptions regarding the underlying independence of the errors. The errors for nanopore should be uniform but I am not sure. Any molecular biologist here care to comment?

biophysboy · on Dec 26, 2021

I know that the error rate of the oxford nanopore sequencer depends on GC content (guanine/cytosine nucleotides), and that the Pacific Biosciences sequencer uses a polymerase that gets worn down during reading. So there is some non-uniformity in the chemistry.

ampdepolymerase · on Dec 26, 2021

GC rich regions as in hairpin loops? How would the sequencer deal with those?

biophysboy · on Dec 27, 2021

If I'm not mistaken the nanopore tech unwinds double-stranded DNA during the reading, so I don't think hairpins are the issue.

biophysboy · on Dec 26, 2021

The instruments do exactly as you say (run the sample N times), but this obviously comes at a cost. Also, keep in mind that sequencing needs to be very, very accurate to be useful. We share most of our DNA, and the small variations make up all the difference.

nyolfen · on Dec 26, 2021

what cost do you mean? time/electricity? reagents? or the cost of someone else charging more for more reads?

biophysboy · on Dec 27, 2021

Yes, those are all relevant costs. There's also a tradeoff between accuracy and the number of reads (how many sequences you can observe), or how much data you can get out of the machine.

snystrom · on Dec 27, 2021

Tl;Dr: Nanopore data is historically lower quality than current gold-standard methods, but it is by no means "not viable" in a genomics pipeline. Their newer chemistry flowcells are competitive with current gold-standard (but I've not seen it with my own eyes in the lab yet due to limited release).

There are two components that drive sequencing error rate. 1) The chemistry behind the sequencing (for nanopore sequencing this is the "feeding DNA through a pore" bit) 2) the method to convert raw signal into DNA sequence (this is called "base calling").

The gold-standard in terms of error profile for sequencing is currently the Illumina short read platform. Illumina machines are really just microscopes (TIRF scopes for optics folks) that sequence DNA by visualizing incorporation of dye-labeled nucleotides into the sequenced molecule(s) (Imagine a really slow PCR [1]). Each base is labeled with a different color, then when a molecule has a match it makes a colored spot on the slide that the machine can read (see here for more info & details of newer chemistry that use fewer colors [2]). This whole process is mediated by DNA polymerase which itself has a very low error rate. Another important point is that DNA sequenced on the illumina platform (called a "library") tends to be from "amplified" template DNA, meaning the DNA will have been processed and potentially be missing chemical modifications on the bases that could be present in the organism. This works to Illumina's advantage, because when trying to answer the question of "what is the DNA sequence?" we want the ground-truth DNA, not the modification state.

In contrast, Nanopore sequencing works by feeding a long strand of DNA through a pore and measuring the change in electrical current through the pore (watch the cool video [3]). For the current set of nanopore flowcells, 8 bases of DNA sit in the pore at a time, meaning the current at each timestep is a product of 8 nucleotides in aggregate. This also means that the pore "sees" each base 8 times, but always in the context of an additional 7. In order to basecall from the raw signal, it's not as easy as saying "blue = A", instead, you have to deconvolve each base from a complex signal. As you might imagine, the folks at Oxford Nanopore & broader research community have turned to machine learning-based base callers to solve this problem, and they work quite well [4]. But they are not perfect. Deconvolving runs of the same base (e.g. "AAAAAAA") is difficult because without well-defined signal changes between bases, the caller has a hard time deciding how many bases it has seen, so a common error mode for nanopore sequencing is to create insertions/deletions at places in the genome with low nucleotide diversity. Another interesting reason is that most Nanopore library preps are often performed on unamplified DNA, and so in addition to normal A/T/G/C nucleotides, the template DNA can also contain bases with chemical modifications. For example, in bacteria, A's are often methylated, and in Humans, C can have all kinds of different modifications (5-methyl-cytosine, 5-hydroxymethyl-cytosine, etc. etc.) and each different modification affects the signal in the nanopore. Therefore, basecallers that weren't trained on modified bases will produce basecalling errors in the presence of base modifications.

For both Illumina and Nanopore basecallers, they assign a quality score to each base that indicates the probability that the basecaller produced an incorrect value. This is called a Q-score, which is defined as "Q = -10(log10(P-value))" (i.e. Q / 10 = the order of magnitude of the error probability) [5]. For example, a Q-score of 10 means an error rate of 1 in 10, but a Q-score of 50 means an error rate of 1 in 100,000. For Illumina sequencing, >95% of the reads have a Q-score > 30 (i.e. 1 in 1000 errors), while Nanopore reads tend to have lower average Q-scores (~Q20, i.e. 1 in 100 errors). For genetics, where 1 base difference can mean the difference between a severe disease allele vs a normal variant, 1 in 100 won't cut it.

The current gen Nanopore flowcell chemistry (R9.4.1) is what most people are talking about when they talk about Nanopore error rates, but they've just released a new pore type & made some basecaller upgrades that improve the accuracy to what they call "Q20+" and some claims of Q>30, and from the data I've seen, it's impressive, I just haven't got my hands on one yet to see for myself [6]. I think the comment saying "wait 5 years" is an overestimate, but if you want to genotype yourself today, I'd just pay someone for Illumina sequencing and process the fastq files yourself if you really want to do it as a learning exercise.

I've unintentionally written an essay, so I'll stop here, but real quick to your other point RE: rerunning the sample N times & using the repeats for error correction. This won't work the way you're thinking because a "sample" is actually a collection of DNA molecules that are sampled randomly by the sequencer. You have no way of knowing that the same read between runs was actually from the same molecule, so you can't error correct this way. Consequently, a totally different sequencing platform from Pacific Biosciences uses this strategy by doing some really cool chemistry, but I'll spare you the second essay (google "PacBio HiFi" or "circular consensus reads" if you're interested).

[1] https://en.wikipedia.org/wiki/Polymerase_chain_reaction

[2] https://www.ecseq.com/support/ngs/do-you-have-two-colors-or-...

[3] https://www.youtube.com/watch?v=RcP85JHLmnI

[4] This paper is a tad out of date, but Ryan Wick always writes extremely clear papers: https://genomebiology.biomedcentral.com/articles/10.1186/s13...

[5] https://www.illumina.com/documents/products/technotes/techno...

[6] https://nanoporetech.com/about-us/news/oxford-nanopore-tech-...

Edit: reformatted links for clarity.

jcims · on Dec 27, 2021

I for one am glad you wrote the essay, this was incredibly informative and filled in a bunch of blanks I had after reading what I could scratch together on the MinION product. I think I'm in a partial state of shock at how accessible this is becoming. Thank you!

AstroDogCatcher · on Dec 27, 2021

Thanks - fascinating stuff. I'm now even more convinced I want to give it a try, but I think I'll play around with public data and tutorials before leaping into home sequencing.

snystrom · on Dec 27, 2021

You totally should, it's a lot of fun. I'd suggest trying to find some bacterial genome sequencing (like E. coli) done on nanopore if you're interested in those data. I don't have a link to any handy right now, otherwise I'd post here, but assembling bacterial genomes is shockingly easy these days and doesn't need near as many resources as doing a human genome, so it's great for learning (I love the assembler Flye [1] for this).

And RE: home sequencing, honestly the hardest part for a beginner will likely be the sample prep, since that takes some combination of wet lab experience and expensive equipment. I really wish molecular biology was as simple to get hacking on as writing software. The lag time between doing an experiment and getting a result is so much longer than waiting for things to compile, it just makes improving your skills take longer.

[1] https://github.com/fenderglass/Flye

apitman · on Dec 26, 2021

I work in a dry lab but I'm pretty sure you need a lot of expensive chemicals to actually make one of these work, yeah?

mylons · on Dec 26, 2021

yup. that’s the business model for Illumina. it’s very much akin to video game consoles. Illumina might take a hit on selling the machine but makes it up in selling you proprietary reagents.

rootsudo · on Dec 27, 2021

What sort of books/videos do you suggest so one can learn more? This stuff is interesting, and I've always seen inexpensive lab equipment on ebay.

If this can sequence flora, fungi and human DNA for about 10k - I'd buy it, just to experiment and deep dive. That is such a low barrier of entry it itself is interesting.

rbartelme · on Dec 26, 2021

Cost/benefit analysis may dictate that, as other posters suggested, you'd be better served to get raw fastq files from a sequencing lab. Even better if you can send the lab a sample and they'll process the extractions for extra $$.

mylons · on Dec 26, 2021

wow i didn’t know they were that “cheap” now. i used to work for a major competitor to the sequencer you linked, the SOLiD.

and i feel like nanopore is the VR of dna sequencing. it’s always just another few years off.

divbzero · on Dec 26, 2021

> and i feel like nanopore is the VR of dna sequencing. it’s always just another few years off.

Is this also true for nanopores in protein sequencing? This HN comment from a few weeks back [1] pointed out recent progress but perhaps the tech is still not quite there.

[1]: https://news.ycombinator.com/item?id=29481075

joshuamcginnis · on Dec 26, 2021

What do you mean by it's always a few years off? Nanopore will allow you to do high-quality genomic sequencing _now_, in a home lab if you wanted, for less than $3K. If you amortize the 3K by the number of genomes you can sequence on the same flow cell, the price per base or per genome falls precipitously, depending on the size of the genome of course.

ampdepolymerase · on Dec 26, 2021

The one I linked to is a decade out of date and OEM discontinued.

mylons · on Dec 26, 2021

ya my first thought was how hard are reagents to get, but probably not that hard. i wasn’t in the lab, i was in bioinformatics so i’m generally clueless on reagent acquisition.

haihaibye · on Dec 27, 2021

Occulus sold 8.1m units this year, more than XBox.

ampdepolymerase · on Dec 19, 2021

This is not the first attempt. Growing tiny brain organoids is fairly well understood, scaling it up is not. I have not read the paper but previous attempts at this have struggled with getting useful outputs out of the neural culture, most of the time the spikes just add up to noise.

ampdepolymerase · on Dec 12, 2021

Tertiary education at the elite level is rarely about the quality of education. The difference between Georgia Tech and Columbia or Harvard's CS program is not that stark. However, the story is completely different when it comes to alumni network, access to funding, future opportunities. Prestige and network effects compound and an argument can be made for whether we should allow academia to be the gatekeepers of social mobility at the highest level.

Certain Ivies favour grade inflation yet their students do not suffer dilution of their degree value, this is not a privilege extended to many other institutions.

overeater · on Dec 12, 2021

The Ivy League factor plays a smaller role for CS than for other disciplines, due to less gatekeeping. Yale is about University of Minnesota level across multiple computer science ranking measures: https://drafty.cs.brown.edu/csopenrankings/

ampdepolymerase · on Dec 12, 2021

Same quality yes, but not access to the same opportunities.

kkjjkgjjgg · on Dec 12, 2021

You are not forced to pay attention to a Yale degree, though? If it affects government funding, OK, you can discuss it.

Building networks is also not a bad thing. It is good if people know and trust each other and know who to turn to for help. It is a stupid idea of modernism or whatever that there should be no networking.

rank0 · on Dec 12, 2021

GT’s CS program is held in higher regard within the industry.

The Ivy League has great stem programs but the most prestigious degrees come from other schools like law, business and medicine.

ampdepolymerase · on Dec 12, 2021

> What differentiates SpaceX from NASA, or SpaceX from Blue Origin, is people and culture. We are not trying to build an institute or academic minded organization where papers are more important than products. Our goal is to build an ambitious, well run, for-profit company that will deliver revenue generating products on the way toward accomplishing its much bigger objective.

It is debatable whether they truely understand the financials of biotech. The grind of basic research will never go away. Many successful biotech company essentially acquihire researchers and work that is already 80% complete, the role of the company is to bring it to production. That itself will make up the bulk of the company's workload. Similarly, SpaceX had the benefit of leveraging an existing pool of talent and resources; you cannot build a heavy launch company in Zimbabwe. If they want to do both active basic research and at the same time trial therapies, then they would need an enormous amount of funding (on the software unicorn level). You will need scale on the same magnitude as Pharma giants like Pfizer, GlaxoSmithKline, Johnson and Johnson to be able to acquire companies, run trials, and discard ideas that do not work. In the current bull market, this is the perfect company to build with Coinbase's founder as the chief fundraiser.

ampdepolymerase · on Dec 12, 2021

We usually use macaques for this (I haven't read the paper). Is there a review paper comparing both creatures? If they are really better, I should look into procuring some. Macaques are quite expensive in north America.

ampdepolymerase · on Dec 12, 2021

Imagine neutering your dog/cat so they would be a less aggressive pet.

ampdepolymerase · on Dec 11, 2021

They ought to rename themselves to biomedical hacking. Biohacking in contemporary usage usually refers to homebrew laboratory-grade biotechnology (those involving PCRs, gene modification, a ton of biochemistry). The dangerous things forum is primarily about implants which falls under biomedical and neural engineering, not biohacking.

On a side note, I strongly do not recommend trying out any of the biomedical implants without a good reason. You can approximate magnet implants to a reasonable extent with a bit of superglue and RFID chips can be placed in rings and jewellery. Risk of infection aside, accidental nerve damage is very difficult to fix with our current level of understanding.

ulrikrasmussen · on Dec 11, 2021

And scrolling through the forum, I get the impression that the implants they discuss are basically just NFC chips? I really have never understood why someone would go through the risk of implanting something that is already very easy to make into a wearable item, when it doesn't actually need to interact with any biological functions. Is it just the novelty factor that motivates them, or am I missing something?

sdmike1 · on Dec 11, 2021

at a high, reductive level, yes. that is what motivated me to get mine.

jrumbut · on Dec 11, 2021

I found it disappointing that the implants were largely security keys of one kind or another.

One could put that in an earring or wristband. Under the skin offers very little advantage.

I was hoping for implants that enhanced the body in some way, helped treat an illness, or that it would look cool at least.

ampdepolymerase · on Dec 9, 2021

Be careful of prions, wild game is not the safest source of protein.

nkurz · on Dec 9, 2021

While in theory you might be correct, do we have any evidence that a human has ever contracted a prion disease from eating a wild animal? I eat wild game, and I'm concerned about CWD, but I don't think there is much cause for alarm yet.

ampdepolymerase · on Dec 7, 2021

Molecular biology education needs more funding. Teaching it at scale (and more importantly, communicating new discoveries outside of textbooks) still remains a challenge. Keeping up with the field without reading papers is much harder compared to CS/ML where almost every ML engineer maintains a blog.

ampdepolymerase · on Dec 5, 2021

If tube fed diets are the trick, why not just go with Soylent? It's about as bacteria-free as food can reasonably get.

bladegash · on Dec 5, 2021

For many with Crohn’s, this is more or less the treatment. Many times it is less because it’s most effective, and more because surgeries removing portions of the large/small intestines has made them incapable of digesting other forms of food. I guess my question would be - would you want to live on Soylent the rest of your life, if it could be avoided? I think most would answer no.

atdrummond · on Dec 5, 2021

Better off getting a proper elemental diet. Plenty of things in soylent that can set me off if I’m already flaring or predisposed to one.

jcrben · on Dec 5, 2021

IIRC studies have generally shown that elemental and polymeric formulas are similarly effective, but I agree that Soylent has some strange stuff that prolly wouldn't agree with me.

I had a good experience with Ensure Plus (chocolate) which was used in a clinical trial or two in Australia (or maybe New Zealand). But I used the one without fiber whereas the one with fiber bothered me.

xeromal · on Dec 5, 2021

Ensure has been a godsend for me too particularly during Ramadan. lol. A great way to get some wholesome calories when needed.