For some extra context, Dolly was not made using the iPSC approach (reversing differentiation from adult cells). Dolly was cloned using a process called Somatic Cell Nuclear Transfer, where the nucleus of an adult cell is placed inside an embryo, replacing the embryo's nucleus. The embryo then develops using the DNA of the individual that donated the nucleus.
The article kind of alludes poorly to this, but somatic cell nuclear transfer (developed by John Durdon) was a huge milestone in the field that eventually led to Yamanaka's discovery of the pluripotency factors, for which they both won the Nobel prize in 2012.
I recall when Dolly was in the news, but only understood the broad strokes of the process. I did scan through that page for 'stem', and found a reference to future research, and noted the replication from an 'adult cell' (which I'd assume analogous to differentiated cell).
I'd guess the transfer process is why she arrived with shorter telomeres at birth?
> 'adult cell' (which I'd assume analogous to differentiated cell)
That's correct.
> I'd guess the transfer process is why she arrived with shorter telomeres at birth?
Also spot on. The donor somatic cell likely had shortened telomeres, resulting in shorter length in the embryo. It's kind of an interesting research question, because telomere length gets "restored" for germ cells (otherwise every time an organism produced sperm/eggs the telomeres would shorten), and these types of experiments demonstrate that there is a limit to that restorative process.
There has been a significant amount of work in this area. One of the most notable breakthroughs was from Shinya Yamanaka (and his lab) in 2006 & onward where they defined a core set of factors that can be expressed in adult cells to convert them back to embryonic stem cell-like states [1,2]. These cells are commonly referred to as induced pluripotent stem cells (abbreviated as iPSC's if you want to google them). There are a bunch of folks working on potential applications of the technology, but as you might imagine, it will take a lot of work the demonstrate safety & efficacy.
Tl;Dr: Nanopore data is historically lower quality than current gold-standard methods, but it is by no means "not viable" in a genomics pipeline. Their newer chemistry flowcells are competitive with current gold-standard (but I've not seen it with my own eyes in the lab yet due to limited release).
There are two components that drive sequencing error rate. 1) The chemistry behind the sequencing (for nanopore sequencing this is the "feeding DNA through a pore" bit) 2) the method to convert raw signal into DNA sequence (this is called "base calling").
The gold-standard in terms of error profile for sequencing is currently the Illumina short read platform. Illumina machines are really just microscopes (TIRF scopes for optics folks) that sequence DNA by visualizing incorporation of dye-labeled nucleotides into the sequenced molecule(s) (Imagine a really slow PCR [1]). Each base is labeled with a different color, then when a molecule has a match it makes a colored spot on the slide that the machine can read (see here for more info & details of newer chemistry that use fewer colors [2]). This whole process is mediated by DNA polymerase which itself has a very low error rate. Another important point is that DNA sequenced on the illumina platform (called a "library") tends to be from "amplified" template DNA, meaning the DNA will have been processed and potentially be missing chemical modifications on the bases that could be present in the organism. This works to Illumina's advantage, because when trying to answer the question of "what is the DNA sequence?" we want the ground-truth DNA, not the modification state.
In contrast, Nanopore sequencing works by feeding a long strand of DNA through a pore and measuring the change in electrical current through the pore (watch the cool video [3]). For the current set of nanopore flowcells, 8 bases of DNA sit in the pore at a time, meaning the current at each timestep is a product of 8 nucleotides in aggregate. This also means that the pore "sees" each base 8 times, but always in the context of an additional 7. In order to basecall from the raw signal, it's not as easy as saying "blue = A", instead, you have to deconvolve each base from a complex signal. As you might imagine, the folks at Oxford Nanopore & broader research community have turned to machine learning-based base callers to solve this problem, and they work quite well [4]. But they are not perfect.
Deconvolving runs of the same base (e.g. "AAAAAAA") is difficult because without well-defined signal changes between bases, the caller has a hard time deciding how many bases it has seen, so a common error mode for nanopore sequencing is to create insertions/deletions at places in the genome with low nucleotide diversity. Another interesting reason is that most Nanopore library preps are often performed on unamplified DNA, and so in addition to normal A/T/G/C nucleotides, the template DNA can also contain bases with chemical modifications. For example, in bacteria, A's are often methylated, and in Humans, C can have all kinds of different modifications (5-methyl-cytosine, 5-hydroxymethyl-cytosine, etc. etc.) and each different modification affects the signal in the nanopore. Therefore, basecallers that weren't trained on modified bases will produce basecalling errors in the presence of base modifications.
For both Illumina and Nanopore basecallers, they assign a quality score to each base that indicates the probability that the basecaller produced an incorrect value. This is called a Q-score, which is defined as "Q = -10(log10(P-value))" (i.e. Q / 10 = the order of magnitude of the error probability) [5]. For example, a Q-score of 10 means an error rate of 1 in 10, but a Q-score of 50 means an error rate of 1 in 100,000. For Illumina sequencing, >95% of the reads have a Q-score > 30 (i.e. 1 in 1000 errors), while Nanopore reads tend to have lower average Q-scores (~Q20, i.e. 1 in 100 errors). For genetics, where 1 base difference can mean the difference between a severe disease allele vs a normal variant, 1 in 100 won't cut it.
The current gen Nanopore flowcell chemistry (R9.4.1) is what most people are talking about when they talk about Nanopore error rates, but they've just released a new pore type & made some basecaller upgrades that improve the accuracy to what they call "Q20+" and some claims of Q>30, and from the data I've seen, it's impressive, I just haven't got my hands on one yet to see for myself [6]. I think the comment saying "wait 5 years" is an overestimate, but if you want to genotype yourself today, I'd just pay someone for Illumina sequencing and process the fastq files yourself if you really want to do it as a learning exercise.
I've unintentionally written an essay, so I'll stop here, but real quick to your other point RE: rerunning the sample N times & using the repeats for error correction. This won't work the way you're thinking because a "sample" is actually a collection of DNA molecules that are sampled randomly by the sequencer. You have no way of knowing that the same read between runs was actually from the same molecule, so you can't error correct this way. Consequently, a totally different sequencing platform from Pacific Biosciences uses this strategy by doing some really cool chemistry, but I'll spare you the second essay (google "PacBio HiFi" or "circular consensus reads" if you're interested).
I for one am glad you wrote the essay, this was incredibly informative and filled in a bunch of blanks I had after reading what I could scratch together on the MinION product. I think I'm in a partial state of shock at how accessible this is becoming. Thank you!
Thanks - fascinating stuff. I'm now even more convinced I want to give it a try, but I think I'll play around with public data and tutorials before leaping into home sequencing.
You totally should, it's a lot of fun. I'd suggest trying to find some bacterial genome sequencing (like E. coli) done on nanopore if you're interested in those data. I don't have a link to any handy right now, otherwise I'd post here, but assembling bacterial genomes is shockingly easy these days and doesn't need near as many resources as doing a human genome, so it's great for learning (I love the assembler Flye [1] for this).
And RE: home sequencing, honestly the hardest part for a beginner will likely be the sample prep, since that takes some combination of wet lab experience and expensive equipment. I really wish molecular biology was as simple to get hacking on as writing software. The lag time between doing an experiment and getting a result is so much longer than waiting for things to compile, it just makes improving your skills take longer.
I have literally no skin in the game here, but here's another interesting group that totally bailed on a similar project because I guess they hated Julia so much. Good to see something else crop up.
Ah, that's too bad, but I can't really blame them. Going from a fully-featured Python codebase they already had in https://github.com/brandondube/prysm and translating into Julia would need to be motivated by much more than speed - after all the limiting factor here is FFT and heavy computations like cis, so I'm pleasantly surprised they reported gains from their simple port compared to numpy. It's also much easier to go from a C/C++ computational code to Julia than it is from a Python one, because the mental performance models are more similar.
For reference, we (metalenz.com) have a large Julia codebase centered around optical design, simulation, and analysis. The motivation for that is more along the lines of composability and clarity of abstractions (aided by multiple-dispatch). We can differentiate through our physical optics solver code (forward or reverse) for optimization/ML, plug in our designs across a hierarchy of different E&M solvers, run on GPU, and write very efficient code when our profiling identifies a bottleneck. If we just had to perform one thing (physical optics simulations), then our investment wouldn't be as justified.
Interesting! This really speaks to me. In my opinion your language can have the most foreign concepts or unusual ways how you should optimize code.
But only if it is well documented. And that is really the problem with Julia. Besides some video tutorials, which I don't find helpful, you only have the manual which is not that good.
I said this before here but if you look at Rust which has unique characteristics like the borrow checker and is often described as having a steep learning curve: I was never overwhelmed by it or even felt it was difficult because there is much high-quality material to help you: The amazing book, rust by example and a lot of great third party resources like cheats.rs or the "rust in simple language".
The other day I was trying to look for julia books. Most of them were outdated and people advised against buying them (or the publisher pulled them already).
I think people often underestimate (or just plain don't know about) the degree to which a multiple-dispatch-based programming language like Julia effectively implies its whole own dispatch-oriented programming paradigm, with both some amazing advantages (composability [1], and an IMO excellent balance of speed and interactivity when combined with JAOT compilation), but also some entirely new pitfalls to watch out for (particularly, type-instability [2,3]). Meanwhile, some habits and code patterns that may be seen as "best practices" in Python, Matlab can be detrimental and lead to excess allocations in Julia [4], so it may almost be easier to switch to Julia (and get good performance from day 1) if you are coming from a language like C where you are used to thinking about allocations, in-place methods, and loops being fast.
Things are definitely stabilizing a bit post-1.0, but it's still a young language, so it'll take a while for documentation to fully catch up; in the meanwhile, the best option in my experience has been to lurk the various chat forums (slack/zulip/etc. [5]) and pick up best-practices from the folks on the cutting edge by osmosis.
As an avid Rust and Julia user, I often use Rust even when it's harder because I know I'll never get stuck. And when it finally compiles I'll be rewarded by ludicrous speed.
Hell, even C has decent man pages for the standard library.
Although in Julia's defense (or is it?), they've changed the language a lot over the versions from 0.4 to 1.5, so maybe it's hard to keep docs up to date.
The language has been pretty stable (no breaking changes) since 1.0. The bigger issue is that unlike many of the other newer programming languages (Go, Rust, Swift etc) Julia doesn't have a fortune 500 company sponsoring it, so there aren't any people who's job is documentation. It's all open-source. Hopefully this is fixable through Google Summer of Docs and/or the community sponsoring people to work on this.
A strong argument against this is the potential to apply selective pressure for virus variants that escape antibody binding. After the first vaccine dose, antibody titer in the blood is lower than it would be after 2 doses. Some of the new variants are better at escaping antibody binding, but with high enough antibody titer in the blood, they can still be cleared. Using only 1 dose risks not having enough antibody to eliminate an antibody-escape mutant.
It will be a strong argument when there's more evidence. Do we have any example of significantly stronger viral adaptation in the presence of a vaccine that successfully stimulates the immune system? Even a full course of the less-effective vaccines would still be risking this. The AstraZeneca and Sinovac vaccines only have an efficacy of around 70% after all.
You're correct that there's never been an experiment on the scale of the current pandemic, but there is a long history of demonstrating that viruses rapidly evolve to escape antibodies, and that this effect is enhanced under direct selective pressure.
Late 2020 there was a paper published where they demonstrate this effect for SARS-Cov-2 in human cells, then they look for those variants in humans and find that they were already infecting humans, demonstrating this selection activity happens in the wild [1]. I totally hear you about the 1 does vaccine argument. I think the kicker is two-fold. First, we do not know whether vaccinated individuals can still be infected with and spread COVID. Second is the rollout is currently very slow. You could imagine a scenario where enough folks are vaccinated to add pressure, but not reduce widespread transmission. If the data comes out that you can't transmit covid after the first dose (and I believe that Moderna has this data, but not ready to publish), I think the answer is clear to delay the second dose. But I think it is the absence of the transmission data, not whether evolution will happen to the virus, is part of the reasoning for the current strategy of completing the vaccination course in the ~3 week timeline.
The article kind of alludes poorly to this, but somatic cell nuclear transfer (developed by John Durdon) was a huge milestone in the field that eventually led to Yamanaka's discovery of the pluripotency factors, for which they both won the Nobel prize in 2012.