New Letters Added to the Genetic Alphabet

veddox · on July 10, 2015

So these guys come along, casually expand a well-established code and test it under precisely one small condition. And when this one test gives them some nice data, they say: "Hey, our stuff is better than everything all of nature has ever done!"

A very intellectually stimulating endeavour no doubt, but I expect some more tests before I would call this good science. Claiming that "the new additions appear to improve the alphabet" is simply extrapolation to the nth degree. [1]

Oh and by the way, when the article claims that

> "the three-biopolymer system may have drawbacks, since information flows only one way, from DNA to RNA to proteins"

that is not correct either. For more information, read up on epigenetics.

[1] Note that this quote comes from the article, not the original paper. The original paper is not quite as cocky (at least not in the abstract, but I don't have full access).

ken_e · on July 10, 2015

I can't help but notice the irony in your comment. So this guy comes along and having read one article, says "I can't call this good science."

A very basic summary of molecular biology of the cell:

DNA - library of blueprints, basically instructions on how to build proteins

RNA - copies of blueprints you take out of the library to build proteins so you don't expose DNA to unnecessary hazards

protein - catalyzes reactions so the cell can do stuff, including make new DNA when replicating.

A fundamental conundrum exists when it comes to evolution of this mechanism. DNA is needed to build proteins, but proteins are needed to catalyze the reactions necessary to build DNA. It's a chicken and egg problem... what came first?

When it was discovered that RNA can catalyze certain reactions, presumably because of their slightly higher chemical reactivity, it suggested a way out of this conundrum. What if life originated with RNA only, where RNA acted as both the hereditary and catalyzing machinery?

The problem with this RNA World hypothesis is that the reactions that current RNA can catalyze is very limited. But what if, at the origin of life, when nature could experiment, the genetic alphabet that RNA could play with was bigger, potentially leading to expanded capability? Scientists like Benner have worked for three decades to try to answer this question.

So, your characterization of "comes along, casually expand" a well-established code and claims it's better is grossly unfair.

veddox · on July 10, 2015

I am well aware of the basic molecular biology of the cell, as well as the RNA world hypothesis.

All due respect to Benner for his work - my comment was rather too pointed, I'll concede that. Nonetheless, I am always wary of too much theory being induced from too little data.

Benner's experiment shows that an expanded genetic code can form molecules that show greater chemical functionality in a given situation than that of natural DNA molecules. Now a quote from the abstract:

> This suggests that this system explored much of the sequence space available to this genetic system and that GACTZP libraries are richer reservoirs of functionality than standard libraries.

Already he is starting to extrapolate when he starts talking about the extended libraries in general. The Quantamagazine article then goes on to say:

> In other words, the new additions appear to improve the alphabet, at least under these conditions.

That is true, but for a rather narrow definition of "improve", and a very narrow set of conditions. The result is that the superficial reader goes away thinking "they've made a better DNA".

gus_massa · on July 10, 2015

> The problem with this RNA World hypothesis is that the reactions that current RNA can catalyze is very limited. But what if, at the origin of life, when nature could experiment, the genetic alphabet that RNA could play with was bigger, potentially leading to expanded capability?

Transfer RNA currently have a few additional bases, probably to do same things that are impossible/difficult with the standard 4 bases. See: https://en.wikipedia.org/wiki/Transfer_RNA#Structure

sigmar · on July 10, 2015

>that is not correct either. For more information, read up on epigenetics.

Epigenetics does not violate the central dogma of molecular biology. Copy-paste from wikipedia:

>These epigenetic changes may last through cell divisions for the duration of the cell's life, and may also last for multiple generations even though they do not involve changes in the underlying DNA sequence of the organism;[5] instead, non-genetic factors cause the organism's genes to behave (or "express themselves") differently.

jonathansizz · on July 10, 2015

Note that the original quote is a common misunderstanding of the central dogma, and is incorrect (for example reverse transcription RNA --> DNA is common in nature).

The central dogma was invented by Crick, and states that once sequential nucleic acid information (either DNA or RNA) has passed into protein, it cannot be recovered. This has never been violated.

Amusingly, it was none other than Crick's pal Watson who popularized the incorrect version via his college textbook, and it is this incorrect version that is regularly announced to have been 'disproven'.

veddox · on July 10, 2015

Thanks for your comment. The word sequential is absolutely key in this context.

api · on July 10, 2015

"And when this one test gives them some nice data, they say: "Hey, our stuff is better than everything all of nature has ever done!""

https://en.wikipedia.org/wiki/Dunning–Kruger_effect

MaxScheiber · on July 10, 2015

Steven Benner is an expert in the field. Dunning-Kruger does not apply at all.

It looks more like he's trying (EDIT: or his funding foundation is trying) to talk up his research, which is understandable.

api · on July 10, 2015

I wasn't referring to anyone's expertise in the field, but to humanity's understanding of a five billion year old intelligent system. What we've done so far is impressive, but we've really only scratched the surface. It's a bit early to claim superiority.

We actually do know something about why the genetic code might have only four codons, and other aspects of its structure. It gets into combinatorics and search.

http://www.ncbi.nlm.nih.gov/pubmed/9732450

dnautics · on July 10, 2015

It's not him, it's the simons foundation which funds his work.

dibujante · on July 10, 2015

There are actually 84 possible combinations with 4 base pairs if you accept sequences of length < 3.

However, if you assume all sequences are length 3, you still get 64 combinations.

We only use 20 out of that space. And if you look at how base pairs encode to amino acids, for half of them, only the first two base pairs even matter - since it's prefix-free you can guess the amino acid if you see those two and even ignore the third.

Given how underutilized this space is, I'm not convinced that increasing the domain to 216 will lead to much more than the ability to express our current amino acid space with only two base pairs.

Terribledactyl · on July 10, 2015

Minor >> Underutilized may not be the best way to think about it. There is a lot of redundancy that affords protection against mutations.

veddox · on July 10, 2015

And actually, even the redundancy is not complete. A recent paper showed that seemingly "redundant" variations of triplets led to a slightly different 3D folding structure of the DNA, with effects on the physiology of the cell.

sciencerobot · on July 10, 2015

did the have an effect on the translated amino acid sequence or translation? Do you have a reference?

veddox · on July 10, 2015

http://journal.frontiersin.org/article/10.3389/fgene.2014.00...

Sorry, made a mistake in the comment above. It's not the DNA's structure that is changed, it's the resulting protein's. (Different codons slightly alter the rate of translation, leading to a different folding of the protein.)

sciencerobot · on July 11, 2015

Thanks for replying with the reference. That is very interesting

dibujante · on July 10, 2015

You're absolutely right. The likelihood that any particular amino acid-generating codon will mutate to another amino acid-generating codon is non-uniform and the analysis of what impact these particular redundancies has on that is very complex - sometimes, certain amino acids can even stand in for each other.

Certain mutations from one acid to another are more desirable than others, so it's quite possible that the existing structure biases the amino acids towards certain least harmful mutation tendencies.

The whole thing is pretty fascinating.

shiggerino · on July 10, 2015

It would have been nice if the author would have at least acknowledged that in reality they are nucleobases and not tiny, tiny letters curled up in our cell nuclei. Sure, 6-amino-5-nitro-2(1H)-pyridone and 2-amino-imidazo[1,2-a]-1,3,5-triazin-4(8H)one doesn't say much to us laymen, but just saying letters and not mentioning once what they stand for is really poor reporting.

Roodgorf · on July 10, 2015

I suppose the heading and maybe opening paragraph could be seen to give that impression, but the article makes note of the fact these are new nucleotides numerous times throughout the article. I found that pretty sufficient for showing that there are not in fact actual letters in DNA.

Dylan16807 · on July 11, 2015

It still goes the entire article without naming the chemicals, which is really annoying.

And a picture of the three pairs wouldn't hurt.

pavel_lishin · on July 10, 2015

Nitpick: it wouldn't be a potential 216. Some three-"letter" sequences code for the same amino acids, so instead of 4^3 (64) possible amino acids, only 20 are generated. Adding new letters doesn't change what these old words create, so I think there would only be a possible maximum of 172.

(I think I did my math right, but maybe not.)

(edit: thanks duaneb, had my basic bio facts wrong - codons code for amino acids, not proteins.)

duaneb · on July 10, 2015

Nitpick two, codons don't encode for proteins but rather components of proteins (amino acids) that the RNA/ribosome "interprets".

At least that's what my high school bio taught me.

MaxScheiber · on July 10, 2015

I'm not convinced that this is necessarily a good idea biologically, especially after talking to a couple of my friends that are researchers in this space. However, this seems quite interesting for non-biological applications. Take cold storage, for example--with a third base pairing, we can obviously develop an even denser data storage format than with regular DNA.

jey · on July 10, 2015

Neat, but extending amino acids would be even cooler. DNA is mostly "just" a string encoding for information, like binary or hexadecimal. Proteins on the other hand are the actual machines whose blueprints are written in DNA, and they're built out of amino acids. Extending the set of amino acids could extend the set of basic building blocks available to create biomolecular machines.

Of course, teaching ribosomes to handle them and etc will take a lot of additional work, but identifying promising new amino acids would be a nice and major first step.

phkahler · on July 10, 2015

>> Of course, teaching ribosomes to handle them and etc will take a lot of additional work, but identifying promising new amino acids would be a nice and major first step.

There are a couple other amino acids in the tree of life. The mapping from base pairs to aminos is not completely static. And then there's Selenocystine (Sec) which is coded in a very unusual way.

I've often thought the redundancy in the encoding allows mutations to have no effect, so a protein that is well established and important could have a more stable encoding and new things still in flux could be more prone to evolving (less stable encoding). But I have no real data on this.

dnautics · on July 10, 2015

That's already been done many times over. Out of hundreds executed, There are a handful of interesting ones, benzoylphenylalanine, azidophenylalanine, phenylalanine methyl ketone, and bipyalanine, but the thing is, that nature really has a good selection of amino acids to accomplish just about anything it could do. The reliable chemistries that are compatible with a water solvent are few, which is why so much organic chemistry is done in nonaqueous solvent.

logfromblammo · on July 10, 2015

This sounds a lot like a story from 20 years ago, that was probably in Discover Magazine or Scientific American. The new nucleotides at that time were labeled kappa and chi.

And as a point of fact, three-base segments of DNA to not have a one-to-one mapping to amino acids. I also believe that a non-standard use of one of the three stop codons can change an encoded methionine to selenomethionine, with similar special cases for other proteins using rare amino acids.

Furthermore, 6^3=216, but that doesn't mean that adding a new base pair can code for that many amino acids. The original set of 4, with 64 possible codons, usually encode for 20 amino acids (excepting special cases as with selenomethionine). mRNA also employs uracil and tRNA adds hypoxanthine. These lead to "wobble pairs" which in turn allow a single tRNA to match several different-but-synonymous codons.

As it stands now, every codon without a matching tRNA would be a different variety of stop codon.

Now, what would be interesting to me is if the P-Z pairs could match some tRNA anticodons that translate stereoisomers of the standard 20 amino acids (or actually just the 19 that are chiral). That way, the D-(KLAKLAK)2 apoptosis promoter sequence could be synthesized directly by the ordinary transcription-translation mechanics of a cell.

NoMoreNicksLeft · on July 10, 2015

This article is ignorant.

>Why nature stuck with four letters is one of biology’s fundamental questions. Computers, after all, use a binary system with just two “letters” — 0s and 1s. Yet two letters probably aren’t enough to create the array of biological molecules that make up life. “If you have a two-letter code, you limit the number of combinations you get,” said Ramanarayanan Krishnamurthy, a chemist at the Scripps Research Institute in La Jolla, Calif.

This simply isn't true. Even with regular DNA, the word size is 3 nucleotides long... giving you 64 instructions. If I remember my highschool biology, only some of these are even used, the rest are duplicates or unused.

Binary would work too, assuming ribosomes and mRNA could expand the word size... you only need 6 bits to do the same as natural DNA.

Is there something I don't know that fixes word size at 3 nucleotides?

apalmer · on July 10, 2015

Not sure I understand the benefit, it's denser, on the other hand from what I understand DNA generally does have much in the way of size constraints. If I remember large swathes of DNA is inactive and there isn't selective pressure to clean up this wasted space. Coupled with the fact that it is apparently more error prone and seems to show why evolution didn't go down this path.

Probabably will be very useful for synthetic purposes where there isn't too much concern about fidelity after 10 million years of copying.

gherkin0 · on July 10, 2015

> Not sure I understand the benefit, it's denser, on the other hand from what I understand DNA generally does have much in the way of size constraints.

I'm a layman, but they could use the new base pairs to code for unusual amino acids allowing for proteins with novel chemistry.

Also, I think DNA is pretty much only used to encode information, but RNA has important chemical roles (e.g. ribozymes), and the new base pairs open up similar possibilities with that.

DDickson · on July 10, 2015

Sequel to GATTACA(PZ)?

kenj0418 · on July 10, 2015

18 years ago, and I just now realized the title was chosen because they are all DNA letters.

tosseraccount · on July 10, 2015

ZAZA ZAPPATACCA !

scrollaway · on July 10, 2015

PAZTAGACA just rolls off the tongue.

trestletech · on July 10, 2015

Oh, good. Bioinformatics data wasn't big enough with two bits per nucleotide.

mjfl · on July 10, 2015

Very interesting concept. One thing I noticed after developing several genetic algorithms on my own is that they tend to give a good creative hint at what the solution to the problem should be, which the human mind can then interpret and produce what the genetic algorithm was "trying" to approach. I wonder if the same could be true with biological evolution, that there are better ways of storing genetic information than DNA and all that, but that DNA is a good guideline to what should be done.

mbq · on July 10, 2015

Even with PZ DNA would have major and minor groove rather than being a symmetrical double helix beloved by virtually all illustrators, sadly also those of pop sci articles...

gherkin0 · on July 10, 2015

IIRC, E.T. (from the movie) had DNA with six nucleotides.

sciencerobot · on July 10, 2015

and humans have 40 DNA memo groups (5th element)

jacob019 · on July 10, 2015

The "enhanced" DNA escapes into the wild where a new pathogen spreads over earth. All life is defenseless against the bizzare genetic alphabet...

gmarx · on July 10, 2015

Not sure. The behavior you posit sounds like a virus, but the regular cells would lack the machinery to translate the artificial genes. Defenses against bacteria are mostly directed against surface proteins and these proposed organisms would have regular proteins

Karunamon · on July 10, 2015

Or a prion - not really a virus, but an agent that causes the body to misbehave, almost by accident.

hliyan · on July 10, 2015

A misfolded protein? That sounds more likely with this type of DNA, doesn't it?

gmarx · on July 10, 2015

Considering how rare prions are I don't consider it likely that an escaped artificial DNA organism would accidentally produce one

gherkin0 · on July 10, 2015

Organisms with the new base pairs probably wouldn't have so much of an advantage that they completely replace all life that currently exists.

However, it does raise an interesting question if some future species would be able to figure out evolution and the origins of life, since there would literally be intelligently-designed organisms running around.

Dylan16807 · on July 11, 2015

Pathogens are not fought by attacking their DNA. It's not going to make any difference, except that it's harder to breed it with anything else.

viraptor · on July 11, 2015

Isn't this exactly how bacteria fights viruses? That is - by cutting known dna patterns using restriction enzymes. https://en.m.wikipedia.org/wiki/Restriction_enzyme

dnautics · on July 10, 2015

And since p and z are synthesized in the lab how would this escaped organism eat?

gherkin0 · on July 10, 2015

It would have to have evolved or be designed to be able to synthesize them.

https://en.wikipedia.org/wiki/Nucleotide#Synthesis

sciencerobot · on July 10, 2015

Life finds a way.

dnautics · on July 10, 2015

Why hasn't life found a better way to fix nitrogen yet? Nitrogenase is a godawful enzyme... Wastes three hydrogen gas molecules for each turn of the crank.