So these guys come along, casually expand a well-established code and test it under precisely one small condition. And when this one test gives them some nice data, they say: "Hey, our stuff is better than everything all of nature has ever done!"
A very intellectually stimulating endeavour no doubt, but I expect some more tests before I would call this good science. Claiming that "the new additions appear to improve the alphabet" is simply extrapolation to the nth degree. [1]
Oh and by the way, when the article claims that
> "the three-biopolymer system may have drawbacks, since information flows only one way, from DNA to RNA to proteins"
that is not correct either. For more information, read up on epigenetics.
[1] Note that this quote comes from the article, not the original paper. The original paper is not quite as cocky (at least not in the abstract, but I don't have full access).
I can't help but notice the irony in your comment. So this guy comes along and having read one article, says "I can't call this good science."
A very basic summary of molecular biology of the cell:
DNA - library of blueprints, basically instructions on how to build proteins
RNA - copies of blueprints you take out of the library to build proteins so you don't expose DNA to unnecessary hazards
protein - catalyzes reactions so the cell can do stuff, including make new DNA when replicating.
A fundamental conundrum exists when it comes to evolution of this mechanism. DNA is needed to build proteins, but proteins are needed to catalyze the reactions necessary to build DNA. It's a chicken and egg problem... what came first?
When it was discovered that RNA can catalyze certain reactions, presumably because of their slightly higher chemical reactivity, it suggested a way out of this conundrum. What if life originated with RNA only, where RNA acted as both the hereditary and catalyzing machinery?
The problem with this RNA World hypothesis is that the reactions that current RNA can catalyze is very limited. But what if, at the origin of life, when nature could experiment, the genetic alphabet that RNA could play with was bigger, potentially leading to expanded capability? Scientists like Benner have worked for three decades to try to answer this question.
So, your characterization of "comes along, casually expand" a well-established code and claims it's better is grossly unfair.
I am well aware of the basic molecular biology of the cell, as well as the RNA world hypothesis.
All due respect to Benner for his work - my comment was rather too pointed, I'll concede that. Nonetheless, I am always wary of too much theory being induced from too little data.
Benner's experiment shows that an expanded genetic code can form molecules that show greater chemical functionality in a given situation than that of natural DNA molecules. Now a quote from the abstract:
> This suggests that this system explored much of the sequence space available to this genetic system and that GACTZP libraries are richer reservoirs of functionality than standard libraries.
Already he is starting to extrapolate when he starts talking about the extended libraries in general. The Quantamagazine article then goes on to say:
> In other words, the new additions appear to improve the alphabet, at least under these conditions.
That is true, but for a rather narrow definition of "improve", and a very narrow set of conditions. The result is that the superficial reader goes away thinking "they've made a better DNA".
> The problem with this RNA World hypothesis is that the reactions that current RNA can catalyze is very limited. But what if, at the origin of life, when nature could experiment, the genetic alphabet that RNA could play with was bigger, potentially leading to expanded capability?
>that is not correct either. For more information, read up on epigenetics.
Epigenetics does not violate the central dogma of molecular biology. Copy-paste from wikipedia:
>These epigenetic changes may last through cell divisions for the duration of the cell's life, and may also last for multiple generations even though they do not involve changes in the underlying DNA sequence of the organism;[5] instead, non-genetic factors cause the organism's genes to behave (or "express themselves") differently.
Note that the original quote is a common misunderstanding of the central dogma, and is incorrect (for example reverse transcription RNA --> DNA is common in nature).
The central dogma was invented by Crick, and states that once sequential nucleic acid information (either DNA or RNA) has passed into protein, it cannot be recovered. This has never been violated.
Amusingly, it was none other than Crick's pal Watson who popularized the incorrect version via his college textbook, and it is this incorrect version that is regularly announced to have been 'disproven'.
I wasn't referring to anyone's expertise in the field, but to humanity's understanding of a five billion year old intelligent system. What we've done so far is impressive, but we've really only scratched the surface. It's a bit early to claim superiority.
We actually do know something about why the genetic code might have only four codons, and other aspects of its structure. It gets into combinatorics and search.
There are actually 84 possible combinations with 4 base pairs if you accept sequences of length < 3.
However, if you assume all sequences are length 3, you still get 64 combinations.
We only use 20 out of that space. And if you look at how base pairs encode to amino acids, for half of them, only the first two base pairs even matter - since it's prefix-free you can guess the amino acid if you see those two and even ignore the third.
Given how underutilized this space is, I'm not convinced that increasing the domain to 216 will lead to much more than the ability to express our current amino acid space with only two base pairs.
And actually, even the redundancy is not complete. A recent paper showed that seemingly "redundant" variations of triplets led to a slightly different 3D folding structure of the DNA, with effects on the physiology of the cell.
Sorry, made a mistake in the comment above. It's not the DNA's structure that is changed, it's the resulting protein's. (Different codons slightly alter the rate of translation, leading to a different folding of the protein.)
You're absolutely right. The likelihood that any particular amino acid-generating codon will mutate to another amino acid-generating codon is non-uniform and the analysis of what impact these particular redundancies has on that is very complex - sometimes, certain amino acids can even stand in for each other.
Certain mutations from one acid to another are more desirable than others, so it's quite possible that the existing structure biases the amino acids towards certain least harmful mutation tendencies.
It would have been nice if the author would have at least acknowledged that in reality they are nucleobases and not tiny, tiny letters curled up in our cell nuclei. Sure, 6-amino-5-nitro-2(1H)-pyridone and 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8H)one doesn't say much to us laymen, but just saying letters and not mentioning once what they stand for is really poor reporting.
I suppose the heading and maybe opening paragraph could be seen to give that impression, but the article makes note of the fact these are new nucleotides numerous times throughout the article. I found that pretty sufficient for showing that there are not in fact actual letters in DNA.
Nitpick: it wouldn't be a potential 216. Some three-"letter" sequences code for the same amino acids, so instead of 4^3 (64) possible amino acids, only 20 are generated. Adding new letters doesn't change what these old words create, so I think there would only be a possible maximum of 172.
(I think I did my math right, but maybe not.)
(edit: thanks duaneb, had my basic bio facts wrong - codons code for amino acids, not proteins.)
I'm not convinced that this is necessarily a good idea biologically, especially after talking to a couple of my friends that are researchers in this space. However, this seems quite interesting for non-biological applications. Take cold storage, for example--with a third base pairing, we can obviously develop an even denser data storage format than with regular DNA.
Neat, but extending amino acids would be even cooler. DNA is mostly "just" a string encoding for information, like binary or hexadecimal. Proteins on the other hand are the actual machines whose blueprints are written in DNA, and they're built out of amino acids. Extending the set of amino acids could extend the set of basic building blocks available to create biomolecular machines.
Of course, teaching ribosomes to handle them and etc will take a lot of additional work, but identifying promising new amino acids would be a nice and major first step.
>> Of course, teaching ribosomes to handle them and etc will take a lot of additional work, but identifying promising new amino acids would be a nice and major first step.
There are a couple other amino acids in the tree of life. The mapping from base pairs to aminos is not completely static. And then there's Selenocystine (Sec) which is coded in a very unusual way.
I've often thought the redundancy in the encoding allows mutations to have no effect, so a protein that is well established and important could have a more stable encoding and new things still in flux could be more prone to evolving (less stable encoding). But I have no real data on this.
That's already been done many times over. Out of hundreds executed, There are a handful of interesting ones, benzoylphenylalanine, azidophenylalanine, phenylalanine methyl ketone, and bipyalanine, but the thing is, that nature really has a good selection of amino acids to accomplish just about anything it could do. The reliable chemistries that are compatible with a water solvent are few, which is why so much organic chemistry is done in nonaqueous solvent.
This sounds a lot like a story from 20 years ago, that was probably in Discover Magazine or Scientific American. The new nucleotides at that time were labeled kappa and chi.
And as a point of fact, three-base segments of DNA to not have a one-to-one mapping to amino acids. I also believe that a non-standard use of one of the three stop codons can change an encoded methionine to selenomethionine, with similar special cases for other proteins using rare amino acids.
Furthermore, 6^3=216, but that doesn't mean that adding a new base pair can code for that many amino acids. The original set of 4, with 64 possible codons, usually encode for 20 amino acids (excepting special cases as with selenomethionine). mRNA also employs uracil and tRNA adds hypoxanthine. These lead to "wobble pairs" which in turn allow a single tRNA to match several different-but-synonymous codons.
As it stands now, every codon without a matching tRNA would be a different variety of stop codon.
Now, what would be interesting to me is if the P-Z pairs could match some tRNA anticodons that translate stereoisomers of the standard 20 amino acids (or actually just the 19 that are chiral). That way, the D-(KLAKLAK)2 apoptosis promoter sequence could be synthesized directly by the ordinary transcription-translation mechanics of a cell.
>Why nature stuck with four letters is one of biology’s fundamental questions. Computers, after all, use a binary system with just two “letters” — 0s and 1s. Yet two letters probably aren’t enough to create the array of biological molecules that make up life. “If you have a two-letter code, you limit the number of combinations you get,” said Ramanarayanan Krishnamurthy, a chemist at the Scripps Research Institute in La Jolla, Calif.
This simply isn't true. Even with regular DNA, the word size is 3 nucleotides long... giving you 64 instructions. If I remember my highschool biology, only some of these are even used, the rest are duplicates or unused.
Binary would work too, assuming ribosomes and mRNA could expand the word size... you only need 6 bits to do the same as natural DNA.
Is there something I don't know that fixes word size at 3 nucleotides?
Not sure I understand the benefit, it's denser, on the other hand from what I understand DNA generally does have much in the way of size constraints. If I remember large swathes of DNA is inactive and there isn't selective pressure to clean up this wasted space. Coupled with the fact that it is apparently more error prone and seems to show why evolution didn't go down this path.
Probabably will be very useful for synthetic purposes where there isn't too much concern about fidelity after 10 million years of copying.
> Not sure I understand the benefit, it's denser, on the other hand from what I understand DNA generally does have much in the way of size constraints.
I'm a layman, but they could use the new base pairs to code for unusual amino acids allowing for proteins with novel chemistry.
Also, I think DNA is pretty much only used to encode information, but RNA has important chemical roles (e.g. ribozymes), and the new base pairs open up similar possibilities with that.
Very interesting concept. One thing I noticed after developing several genetic algorithms on my own is that they tend to give a good creative hint at what the solution to the problem should be, which the human mind can then interpret and produce what the genetic algorithm was "trying" to approach. I wonder if the same could be true with biological evolution, that there are better ways of storing genetic information than DNA and all that, but that DNA is a good guideline to what should be done.
Even with PZ DNA would have major and minor groove rather than being a symmetrical double helix beloved by virtually all illustrators, sadly also those of pop sci articles...
Not sure. The behavior you posit sounds like a virus, but the regular cells would lack the machinery to translate the artificial genes. Defenses against bacteria are mostly directed against surface proteins and these proposed organisms would have regular proteins
Organisms with the new base pairs probably wouldn't have so much of an advantage that they completely replace all life that currently exists.
However, it does raise an interesting question if some future species would be able to figure out evolution and the origins of life, since there would literally be intelligently-designed organisms running around.
Why hasn't life found a better way to fix nitrogen yet? Nitrogenase is a godawful enzyme... Wastes three hydrogen gas molecules for each turn of the crank.
A very intellectually stimulating endeavour no doubt, but I expect some more tests before I would call this good science. Claiming that "the new additions appear to improve the alphabet" is simply extrapolation to the nth degree. [1]
Oh and by the way, when the article claims that
> "the three-biopolymer system may have drawbacks, since information flows only one way, from DNA to RNA to proteins"
that is not correct either. For more information, read up on epigenetics.
[1] Note that this quote comes from the article, not the original paper. The original paper is not quite as cocky (at least not in the abstract, but I don't have full access).