当前位置:首页>新闻动态>研究进展 >正文
Genetic code: mystery of its origin (a new theory)
作者: 来源 : 时间:2015-12-10 字体<    >

Genetic code: mystery of its origin (a new theory)

本文引用地址:http://blog.sciencenet.cn/blog-815628-942275.html  此文来自科学网倪乐意博客

The origin of genetic codes is still one of the greatest mysteries  in modern life science. Recently, a new theory (called as ATP centric hypothesis) is developed by Prof. Ping Xie to crack this mystery......

Cracking the mystery of genetic codes

Ping Xie*

Donghu Experimental Station of Lake Ecosystems, State Key Laboratory of Freshwater Ecology and Biotechnology of China, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072, People’s Republic of China

* E-mail: xieping@ihb.ac.cn


More than half a century has passed since the discovery of genetic codes, but their origin is still one of the greatest mysteries  in modern life science. Are the genetic codons really unknowable? Do they really require external design? Here, I present an ATP-centric hypothesis aimed at exploring the hidden primordial world inspiring the originof genetic codes. I examined  how and why ATP is at the heart of the extantbiochemical system, and how the genetic codes came into being with theevolution of biochemical system driven by photosynthesis. It is challenging tocrack the mystery.


It is a miracle of nature that a set ofgenetic codes have assembled tens of millions of different species on the earth. But no  one knows exactly how these genetic codes came into being. Biologists,probably a majority, hold a pessimistic view that an exact reconstruction ofthe process of code construction may never be possible [1]. Yockey [2] claims that the origin of the  genetic  code is unknowable, as there is no trace in physics or chemistry of the control of chemical reactions by a sequence  of any sort  or of a code between sequences. He criticizes that many papers have been published with titles indicating  that their subject  is the origin of thegenetic code, but actually the content deals only with its evolution.

So far, several hypotheses have been proposed. The frozen accident hypothesis states that allocation of codons to amino  acids in thesingle ancestor was entirely a matter of “chance”,and then remained unchanged [3]. The stereochemical hypothesis  claims that there is in many cases a specific stereochemical fit between the amino acids and the base sequence of its  codon on  the appropriate tRNA [4]. The biosynthetic hypothesis postulates that the code was assigned inparallel to the evolution of  amino-acid biosynthesis [5]. Knight et al. [6] declares that the genetic code is a product of selection, history and chemistry.  Since then, very little definitive progress has been made, although intensive studies have focused on variation or  flexibility of  the codes and possible rules of codon allocations to amino acids [7-10].

Frankly speaking, these hypotheses suffer from two fatal defects: first, none can explain satisfactorily why the genetic  codes evolved in such a way, and second, none has explained the origin of genetic codes from that of the biochemical system  (a relation of part to whole). In other words, all these hypotheses completely overlooked the coevolution of the genetic codes  with biochemical system. In my view, it is impossible to crack the secret of codon origin just from the codon itself [10, 11],  even extending vision to the possible relationship between codon and amino acids [7].

Another source of the problem may berelated to a so called quasi-species model proposed by Eigen [12]. He strongly  advocated in vitro evolution of macromolecules. A quasi-species in the environment was imaged to be a population  of genetically related RNA molecules which had certain morphological commonness but were not identical.  He supposed that the quasi-species followed the Darwinian process of natural selection.This model has been highly  influential [1],  which, in my view, is among the main obstacles in our understanding the origin of the geneticcodes.

There is an associated problem. Molecular biology hasbeen long troubled by the so-called chicken-and-egg paradox of  protein and nucleic,seemingly a logically circular debate about who appeared first. It is alsocertain that the cues about the origin of macromole cules have been completelylost, due to cyclizing of biochemical paths where the transitional states or  trackshad long disappeared. Just as the ancient Greek philosopher Heraclitus said,the beginning and ending point are overlapped on the circumference of a circle.

It is undeniable that codon and amino acids had beenlinked stereo-chemically [4]. Otherwise, we will  fall into the mire  of God's creation or design. The problem is that the birth of the genetic code must have been motivated by some forces.  Randomness and selection were frequently considered to be the ones, but I do believe they are ways rather than motivation.  Motivation should be energetic, e.g. ATP produced by photosynthesis.

It is beyond all doubt that design of the genetic codons, a chemical language, never required the intervention of a  “hidden  hand”. However, currently, no reliable fossil evidences are available, and eons ofevolution have blurred the molecular vestiges  of the early events that remainin living organisms [13]. But fortunately, we can still look back at thehistory from the extant, even our eyes can perceive only a very minute fractionof the history of the early life.

 How is a biochemical system organized?

To solve the puzzle of the genetic codes, we must first understand how a biochemical system is organized. Consider  what is alife. Life, compositionally, is a unity of matter, energy and information, anddynamically, is a “game” of material  cycling, energy flows and information communication. Energy is the key to support life system. In the past decades,  physicists and chemists have  discovered a lot of details about how organisms are structuredand how they work. It is known  that life is a physiological machine where inconceivably numerous biochemical reactions are taking place to acquire, convert  and use energy. In all  modern autotrophic life forms (plants, algae and somebacteria), the only source of energy is from  sunlight that is converted into chemical energy by a series of complex physico-chemical processes called as photosynthesis [14].

Energetically and informatically, nothing in biochemical system is more important than a nucleotide, ATP. It is the  only energetic product of photosynthesis, carrying chemical energy converted from sunlight. It then provides energy for  metabolisms  through conversions of ATP/ADP/AMP, supporting transformation of various biomolecules into each other in an exquisitely organized cell. In other words, the major metabolic pathways (e.g. the Calvin cycle, glycolysis, and the  Krebs) are all coupled with ATP (Fig. 1).Of course, NAD(P)H,a derivative of nucleotide, is also necessary, as it transports  H and e- (through conversion of NAD(P)H/NAD(P)+).

Figure1. ATP (a carrier of both energy and information) is at the center of the biochemicalsystem in a modern cell.  It provides a unique bridge among photosynthesis, metabolic pathways and genetic information


ATP is not only an indispensable building block of the genetic system (DNA, and RNA), but the other four nucleotides  for genetic coding are also allderived from it. While, information delineates the border between the livingand the inanimate, as  the living world appears as the only place where information is recorded, processed, or used [15]. Therefore, as  anirreplaceable carrier of both energy and information, ATP appears as necessarily having a central role in the  biochemical system.

The importance of ATP in biochemical systems couldbe attributed to its role in governing the evolution of photosynthetic  systems inprimordial life. Although debated, there are signs that life on earth did startout with photosynthesis [16]. First, sunlight, needless to say, has been the most universal source of energy. Second, a biomolecule called cytochrome (an electron transport protein with iron porphyrin or heme as a prosthetic group) seems to be imprinted with photosynthesis (Fig. 2).  Cytochromeis a universal electronic carrier, but present even in chemoautotrophic bacteria [17]. Originally,  the heme  was likely derived from a photosynthetic pigment, chlorophyll, as their biosynthetic pathways are very similar in extant  life  forms [18]. Interestingly, chlorophyll seemed an adduct between a magnesium porphyrin ring and along chain fatty acid from  the membrane. Let me ask, if photosynthetic bacteria were not the Last Universal Common Ancestor (LUCA) of  all modern life forms, why had the chemoautotrophic bacteria used a photosynthesis-imprinted molecule likecytochrome  as an electron carrier?

Figure2. Structural homology between chlorophyll and the heme of cytochrome. Decyclization occurred from magnesiumporphyrin to iron porphyrin (marked with red color). Evolutionarily, the membrane-bound chlorophyll was likely  a merge of  phospholipid and porphyrin. Dashed blue lineswith arrows indicate possible directions of evolution

 How did the genetic codes originate?

The genetic code system was built by achemical mechanism closely related closely with the production of ATP. Energetically,the first job the primordial life should do was how to achieve sufficient production of this nucleotide. While in present-day  photoautotrophs, synthesis of ATP requires a transmembrane gradient of protons that come from biochemical cleavage  of water in photosynthetic system. Therefore, to guarantee such a protongradient, there was first needed a relatively closed entity or a small room, impermeableto H+. Such cubicles were most likely a lipid vesicle, the precursor of protocell. The fact  that synthesis of ATP requires a transmembrane gradient of H+ made it impossible for macromolecules to  evolve in vitro as suggested  by Eigen.

Chemically, it was not impossible, onthe primitive earth, that fatty acids could automatically form a double-layered globular  membrane structure [19]. In modern life forms, cell membrane, consisting of lipid (phospholipid) bilayer, provides controlled  entry and exit ports for the exchange of matter. It permits passing through of small molecules such as  CO2 and  O2  by  diffusion, but acts as a barrier for certain molecules and ions (e.g. H+), leading to different concentrations on the two  sides of  the membrane. H+cannot pass freely across the membrane, unless using transmembrane protein channels. This implies that  the formation of ATP in primordial cell had to relyon polypeptide channel that then developed into ATP synthase. In addition, H+ from  cleavage of H2O also needed the help of polypeptides.

   Reasonably, fluent production of ATP was possible only if the various elements (sunlight,lipid bilayer membrane,  polypeptides, cleavage of H2O, transmembrane H+ gradient, electron carrier……) had be organized orderly. It may beinferred that there were a series of events randomly occurring in protocells. Forexample, the transmembrane proton gradient coupled  with polypeptide channel resulted in the formation of an apparatus (i.e. ATP synthase) to synthetize ATP.Nucleotides  like ATP could form polynucleotides by self-condensation, some ofwhich carried amino acids (precursor of tRNA), and  others built platform for the synthesis of polypeptides (precursor of rRNA), which assumably replaced their regular stochastic  formation of polypeptides from amino acids activated by ATP.On the other hands, the polypeptides in turn participated not  only in theconstruction of the transmembrane channels of hydrophilic molecules/ions, but also in the biochemical  cleavage  of H2O, and in catalyzing self-condensation of nucleotides as well. As a consequence, numerous consecutive reactions were linked into a variety of chains, which might be linear, branched, or cyclic. Fundamentally, these were creative processes of order  out of disorder, and of rationality out of randomness.

   It should be borne in mind that primitive life started out from materials thatwere not organized. The prebiotic mixtures of  chemicals, present on theprimeval earth, were assumed to contain an immense number of life’s building blocks such as ATP,  NADPH, pyrroles, amino acids, aliphatic hydrocarbons,ubiquinones, monosaccharides, and so on [16]. Let us first consider the impermeable lipid vesicle wrapping a plenty of life's building blocks. Their exposure to sunlight would drive dazzling  flows of electrons and H+, causing active re-combinations of elements. This wouldincrease accumulation of large molecules, but   accompanied with ceaseless inputof small molecules like CO2……and therefore, the protocells had to reciprocate  between  enlarging and rupturing, giving rise to the developments of both photosynthesis and cell division. Next,  an information  era  followed: in the cycle of photosynthetic growthfollowed by division, and also through the selection of individual survival, the protocell got used to fixiate regularity, reproducibility and rhythmicityof various biochemical reactions in dealing with  photosynthetic products, which can be called information-based or informatization.

Then, a scenario of how a set of geneticcodes were successfully selected to record, preserve and transmit informationcan be outlined. Firstly, it was likely that in the organic “soup” enclosed by the protocell, the energetic ATP with its derivatives could  randomly extend chains of both polynucleotides and polypeptides, which made it possible toestablish or fix the chemical relation between sequences of nucleotides in polynucleotides and amino acids in polypeptides from their numerous random combinations through selection of cellular survival, an ecological force or a feedback mechanism.

Secondly, informatization was inevitablycoupled with structuralization, such as structural subdivision orspecialization and  functional differentiation, providing basis for the establishment of the triplet codon system. For example, t-RNA was specialized to carry specific amino acid, polypeptides helped matching the acceptor stem of t-RNA to its anticodon, and the system  developed the rule of codon-anticodon base-pairing i.e. molecular recognition, through stereochemical interactions, e.g. hydrogen bonds, van der Waals attractive forces and aromatic stacking, and ushered in a unified platform, r-RNA, for protein  synthesis (synthetizing polypeptides according to m-RNA template), and so on. In this way, macromolecules were functionally differentiated, i.e., handling (record, preserve and transmit) information by polynucleotides and catalyzing  all chemical reactions by polypeptides called as enzymes, and both were further cyclized into a system ofreciprocal causation.  As a result, the codes-based informatization led to fantastic innovations of diverse laws (principles) or patterns, e.g.,  biochemical pathways, of course through a goal-oriented random selection (Fig. 3).

Thirdly, the triplet codon might just be a result of the optimization of biochemical networks under selective pressure,i.e., for handling more than 20 amino acid, it was not so good either too much (more cumbersome) or too little (43=64, thus there is still considerable redundancy of encoding). Namely, triplet was the lowest codenumber to encode the amino acids.

Of cause, it must have been a very long succession of steps for the protocells to test and modify the genetic code  system through the selection of cellular survival. Then, they obtained the capacity to transmit their blueprint recorded in  DNA from generation to generation, and self-building also became a central characteristic of life. The protocell had therefore taken a historic step towards the first genetic cell, i.e. a true species, that could bear, process and transmit information,  reproducing homogeneous, although not absolutely, individuals, and thus capable of Darwinian evolution. It seemed  to be a  first principle that reproduction of individuals with hereditable homogeneity was favored by existence or nature, which -somewhat like a centripetal eco-physiological force - not only governed the integration of various biochemical events,  but also shaped the direction of survival selection.


Figure 3. A simplified conceptual model on the origin of the genetic code based on the photosynthesis-mediated ATP-centric hypothesis. Dashed blue lines indicate evolutionaryprocesses during pre-life period, while solid red lines denote processes orinteractions from pre-life period to the present.  Arrows indicate the directionof influences or actions.

 What made the difference between DNA and RNA?

As ATP is the only nucleotide produced by photosynthesis, it did not appear difficult to convert ATP to any other nucleotide.  Meanwhile, these energetic nucleotides could also self-condense into a wide variety of mRNA, of course also other RNAs.  If they could do this, why was it impossible for them to extend the chain to make a DNA molecule? The progress from  mRNA to DNA was undoubtedly a great step of the informatization in biochemical system. That is, DNA was specialized to accurately record and permanently preserve all genetic information that command all that goes in the cell, while mRNA became short-lived messengers for the implementation of the instructions.

However, in the integrative process of information, there were subtle structural differences between DNA and RNA. First, the second carbon atom of the ribose connected -H in the former, but -OH in the latter. Second, thymine T in DNA was replaced by uracil U in RNA, of course, the structural difference was very small, i.e., T had more than one methyl group. No one can answer why base differences were 1 rather than 4. It seems unbelievable that this had much meaning in structuralization, perhaps just because of a need in difference, as similar cases were not rare, e.g. NADPH andNADH, unless the diverse three-dimensional structures of RNA were due to such a tiny change in ribose or base. Structurally, this was more likely just an accident. But functionally, subdivision of the genetic system into RNA and DNA was certainly for orderly management and control of information in a tiny cell where hundreds of biochemical reactions occur simultaneously. Consequently, the biochemical system came to a status where mRNA were immediately destroyed after their mission was finished, while the genetic information recorded by DNA would be permanently preserved and transmitted to ensure the continuation of the species.

 Who manipulated the RNA world?

Woese [20] proposed the RNA world hypothesis (a term coined by Walter Gilbert in 1986): the earliest biomolecules  on earth were RNA,and then DNA; the early RNA molecules had the abilities of information storage as DNA and catalysis as proteins, and supported the operation of the early cellor the protocell life. I object to such a statement that RNA created the life world from randomness, although agreeing with the view that RNA was earlierthan DNA. Is there any evidence to say that the world of life must have been derived from RNA as it has the functions of information storage and catalysis? Also, this theory is unable to explain why the RNA molecules tended to store geneticinformation and to support the operation of protocells.

In my view, the explanation of the RNA worldhypothesis on the origin of life and the genetic codes is quite farfetched[21-23] and its lack of both motivation and individuality is unfortunate. I propose analternative term, the photosynthesis mediated ATP world hypothesis, i.e., the early life on earth coevolved with the development of photosynthetic system in lipid vesicles where a sophisticated biochemicalsystem was built for a special purpose to synthetize ATP using solar energy.This is evidenced by a series of structural and functional features of the extant biochemical system (e.g., photosynthetic pigments, electron transfer  chain, ATP synthesis by transmembrane H+ gradient, metabolic cycles/paths coupled with ATP, synthesis of RNA and  DNA  by ATP and its derivatives, etc.) as well as their interplay. However, if we are not willing to abandon the term of “RNA world”,  well, the nucleotide ATP should also be the initiator.

 Challenges in the Future

In this article, I outlined a synthetic mechanism for the origin of genetic codes: the energetic ATP together with its derivatives could randomly extend chains of both polynucleotides and polypeptides, which made it possible to establish or fix chemical relations between sequences of nucleotidesin polynucleotides and amino acids in polypeptides from their numerous random combinations through a feedback mechanism (selection of cellular survival); and technically, photosynthesis, a goal-oriented process, enabled various biotic factorsor reactions (ATP, lipid vesicle, informatization, structuralization, homogenous individual, individuality, survival, etc.) to be integrated into an operating system of genetic codes. But, it remains a challenge how to collect supporting evidences.

Perhaps some people will ask why molecules did not stay silently in the organic soup but struggled to shift  from chaos to order? We may attribute this to code-based self-organization with incessant input of solar energy: would such  codes not exist, the protocells would not be able to maintain orderly control of biochemical systems, and the chemical world would still stay in incomprehensible chaos. But, it is still needed to explain why. Is it a first principle that individuals with heritable  homogeneity were favored by existence?

Life, in a sense, is a contradictory unity– it owns generality from homogeneity, but in the meantime possesses individuality from heterogeneity. Individuality can be traced back to the lipid vesicles where life started out. It is the quality that makes one living entity different from all others. It is amazing that such individuality had advanced from pure chemistry in bacteria  to sophisticated  desire (e.g. for competition and struggle), habits, instincts and even spirit in higher animals. Is it possible that individuality, as an eco-physiological force, engaged in re-shaping or re-fixing rhythm or regularity of biochemical  reactions in the protocells? And then, Darwinian evolution could be a matter of course.

One may well wonder whether too much speculation has been superimposed on the ATP-centric hypothesis. But, I can confidently say that the present sketch is clearer than any one of the previous theories in the overall outline or logic, particularly, relating the origin of the genetic codewith the evolution of the photosynthesis-mediated and ATP centered biochemical system. And, of course, I admit that all the hypotheses or theories on the origin of life or genetic code can neither be verified nor falsified (this isnot science in the eyes of Popper, a philosopher of science). This has been the case so far, and perhaps will still be for a period of time in the future. But, because it is a secret about ourselves, and before the  facts are completely known, human beings never stop pursuing. It is possible for us to spy on the secrets of coevolution between  the codon  and the biochemical system from theirintrinsic relationship - this is not necessarily the truth, but at least a wayto the truth.


1.     Rauchfuss H (2008) Chemical evolution and the origin of life.Springer-Verlag Berlin Heidelberg.

2.     Yockey HP (2005) Information theory, evolution, and the origin of life. CambridgeUniversity Press.

3.     Crick FH (1968) The origin of the genetic code. J Mol Biol 38: 367-379.

4.     Woese CR, Dugre DH, Dugre SA, Kondo M, Saxinger WC (1966) On the fundamental nature and evolution  of the genetic code. Cold Spring Harbor Symp Quant Biol 31: 723-736.

5.     Wong JTF (1975) A co-evolution theory of the genetic code. Proc Natl Acad SciUSA 72, 1909–1912.

6.     Knight RD, Freeland SJ, Landweber LF (1999) Selection, history and chemistry: the three faces of  the genetic code. Trends Biochem Sci 24: 241-247.

7.     Ohama T, Inagaki Y, Bessho Y, Osawa S (2008) Evolving genetic code. Proc Jpn Acad SerB 84: 58–74.

8.     Koonin E V, Novozhilov A S (2009) Origin and evolution of the genetic code: the universal enigma.  Cell Mol Biol 61: 99-111.

9.     Shah P, Gilchrist M A. (2011) Explaining complex codon usage patterns with selection for translational efficiency,  mutation bias, and genetic drift. Proc Natl AcadSci USA 108: 10231-10236.

10. Baranov PV, Atkins JF, Yordanova MM (2015) Augmented genetic decoding: global, localand temporal alterations of decoding processes and codon meaning. Nature Rev Genet 16: 517-529.

11. Sciarrino A, Sorba P (2013) Codon-anticodon interaction and the genetic code evolution. Biosystems111: 175-180.

12. Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules.  Naturwissenschaften58: 465-523.

13. Leslie M (2009) On the origin of photosynthesis. Science 323: 1286-1287.

14. Umena Y, Kawakami K, Shen JR, Kamiya N (2011) Crystal structure of oxygen-evolving photosystem II  at a resolution of 1.9 Å. Nature 473: 55-60.

15. Battail G (2014) Information and life. Springer.

16. Xie P (2014) The aufhebung and breakthrough of the theories on the origin and evolution of life.  Science Press (in Chinese).

17. Madigan MT, Martinko JM, Stahl DA, Clark DP (2012) Brock biology of microorganisms (13thed). Prentice Hall.

18. Taiz L, Zeiger E (2010) Plant physiology (4th edition). Sinauer Associates.

19. Wong JTF, Lazcano A (eds.) (2009) Prebiotic evolution and astrobiology. CRC Press.

20. Woese CR (1967) The genetic code: the molecular basis for genetic expression. Harper& Row

21. Zimmer C. (2009) On the origin of life on earth. Science 323: 198-199.

22. Atkins JF, Gesteland RF, Cech TR (eds.) (2011) RNA worlds: from life’s origins to diversity in gene regulation. Cold Spring Harbor Laboratory Press.

23. Sengupta S, Higgs PG. (2015) Pathways of genetic code evolution in ancientand modern organisms. J Mol Evol 80: 229–243.

This article should be cited as follows:

Xie P. 2016. Critical Reviews and Reconstructionof Evolutionary Theories. Beijing: Science Press (in Chinese)

版权所有 中国湿地生态系统联盟
联系方式:沈宏 徐军 027-86780056 Email:wetland@ihb.ac.cn