大自然将奥秘或法则隐匿于一套密码之中,籍此创作出数以千万计的物种,之后又将其销毁,终而复始,生生不息……虽然遗传密码子的破译已过去了半个多世纪,但对它的起源人们依然一无所知,有人甚至宣称这是不可知的(unknowable),另一些人则认为它源自外来的设计(external design)。生命/遗传密码子的起源被誉为现代生命科学的最大谜团之一,但它却关乎人们对生命本质与演化的认知。如何才能揭开这一世纪之谜……
1. 生命——既是物质,亦超越物质
2. 生命——物理化学过程的产物
图1 细胞膜的结构(引自互动百科)
图2 三磷酸腺苷(ATP)的结构,通过磷酸酐键的断裂移去ATP末端的磷酸基团将释放高的能量,这在细胞中与许多吸能反应相耦联(如粉色区域所示)(引自Nelson & Cox 2004)
图3 在内囊体膜上,与光合作用耦联的ATP合成示意图(引自Taiz and Zeiger 2010)
3. 信息与密码——只属于生命
何谓信息呢?笔者十分欣赏控制论的创始人、美国数学家维纳(Norbert Wiener)的精辟定义:“信息是人们在适应外部世界,并使这种适应反作用于外部世界的过程中,同外部世界进行互相交换的内容和名称”。信息论的奠基人、美国数学家香农(Claude Elwood Shannon)的定义也耐人寻味:“信息是用来消除随机不确定性的东西”。无论是探讨生命的起源还是之后的演化,都不该忘记这样的生命禀性。
构成DNA的碱基有4种,而一个氨基酸由3个碱基所决定,这样碱基的理论组合有43 =64种。实际上还有3个终止密码子(不编码氨基酸),因此,共有61个编码氨基酸的密码子。可是,构成蛋白质的氨基酸只有20种,这样,大多数氨基酸都有几个三联体密码(2—6个不等),这就是所谓的简并性,编码同一种氨基酸的密码子互称同义密码子,已知同义密码子在生物界中被使用的频率不尽相同。这种简并性可能起因于相似的立体化学特征。
图4 三联体密码表以及tRNA的反密码子和mRNA的密码子的配对
1. 凝固事件假说(frozen accident hypothesis)
英国分子生物学家克里克(FrancisCrick,DNA双螺旋的发现者之一,1962年获诺贝尔生理学或医学奖)提出了凝固事件假说,认为密码子与氨基酸的关系是在某一时期固定的,之后很难再被改变(Crick 1968)。现在所有的生物几乎使用着同样一套密码似乎支持这一假说,这也表明,所有生物起源自单一的共同祖先。笔者认为,这只是对演化事件时间节点的一种推测,并未说明密码系统是如何起源的。
德国化学家艾根(Manfred Eigen,1967年获诺贝尔化学奖)也表达了类似的想法:“在达尔文物种进化的前面,还有一个类似的分子进化的渐进过程,由此导致了唯一的一种运用普适性密码的细胞机构。这种密码最终确定起来,并不是因为它是唯一的选择,而是由于一种特殊的‘一旦——永存’选择机制,可以从任何随机分配开始”(Eigen & Schuster 1979)。
2. 立体化学假说(stereochemical hypothesis)
美国微生物学家和生物物理学家韦斯(Carl Richard Woese,生命“三域”学说和RNA世界学说的提出者)等提出了立体化学假说(Woese et al. 1966),认为氨基酸与它们相对应的密码子有选择性的化学结合力,即遗传密码的起源和分配与RNA和氨基酸之间的直接化学作用密切相关,或者说,密码子的立体化学本质取决于氨基酸与相应的密码子之间物理和化学性质的互补性。
3. 共进化假说(co-evolution hypothesis)
长期从事基因密码研究的华裔学者王子晖(J. Tze-FeiWong)提出了共进化假说(Wong 1975),认为密码子系统有原始氨基酸形成的前生物代谢途径的印迹,因此可以从氨基酸代谢途径发现密码子的演变过程,即密码子的进化与氨基酸生物合成的进化是并列的。该假说认为,氨基酸和相应编码的忠实性反映了氨基酸生物合成路径的相似性,并非物理化学性质的相似性。笔者认为,这也只是在推测密码子起源的一种可能路线,并未说明为何如此演化,此外,从简单的原料合成各种氨基酸可能是发生在前生命演化末期的事情了。
4. 综合进化假说
5. 其它假说
1981年艾根提出了试管选择(in vitro selection)假说,1989年英国化学家奥格尔(Leslie Eleazer Orgel)提出了解码(decoding)机理起源假说,1988年比利时细胞生物学和生物化学家杜维(Christian de Duve,1974年获诺贝尔生理学或医学奖)提出了第二遗传密码(secondgenetic code)假说。
英国巴斯大学的Wu等(2005)推测,三联体密码从两种类型的双联体密码逐渐进化而来, 这两种双联体密码是按照三联体密码中固定的碱基位置来划分的, 包括前缀密码子(Prefixcodons)和后缀密码子(Suffixcodons)。不过,也有人推测三联体密码子是从更长的密码子(如四联体密码子quadrupletcodons)演变而来,因为长的密码子具有更多的编码冗余从而能抵御更大的突变压力(Baranov etal. 2009)。
2007年中国科学院北京基因组研究所的肖景发和于军(Yu 2007,Xiao and Yu 2007)提出了遗传密码的分步进化假说(stepwiseevolution hypothesis),认为最初形成的遗传密码应该仅仅由腺嘌呤A和尿嘧啶U来编码, 共编码7个多元化的氨基酸, 随着生命复杂性的增加, 鸟嘌呤G从主载操作信号的功能中释放出来, 再伴随着C的引入, 使遗传密码逐步扩展到12, 15和20个氨基酸(肖景发和于军2009)。
厦门大学的有机化学家赵玉芬(Zhao andCao 1994, 1996, Zhao et al. 1995, Zhou et al. 1996)也曾提出核酸与蛋白共同起源的观点,认为“磷是生命化学过程的调控中心”,因为磷酰化氨基酸能同时生成核酸及蛋白,又能生成LB-膜及脂质体。她认为,原始地球火山频发,焦磷酸盐、焦磷酸脂类化合物容易在地表积累,其P—O—P键含有的能量,通过与氨基酸形成P—N键,最终转移到肽键和核苷酸的磷酸二酯键中。她推测,磷酰化氨基酸在同时生成蛋白质和DNA/RNA的过程中,蛋白质与DNA/RNA可以通过磷酰基的调控作用相互影响,从而产生了原始密码子的雏形,并进一步进化到遗传密码的现代形式。但问题是,磷酰化氨基酸为何要导演核酸和蛋白质的共进化故事呢?
也有将关于密码子起源的各种学说分为这样四类的:化学原理(Chemical principles)、生物合成扩展(Biosynthetic expansion)、自然选择(Natural selection)和信息通道(Information channels)。根据信息理论研究中的率失真模型(rate-distortion models)推测,遗传密码子的起源取决于三种相互冲突的进化力量的平衡:对多样的氨基酸的需求、抵御复制错误以及资源最小成本化(Freeland et al. 2003,Sellaand Ardell 2006,Tlusty 2008,)。
1. ATP——从能量到信息
图5 作为能量和信息的载体的ATP在现代细胞中位于生化系统的中心,它在光合作用、代谢通路和遗传信息之间架起了桥梁
2. 信息整合——从mRNA到DNA
图6 脱氧核糖核酸(DNA)和核糖核酸(RNA)基本结构的比较(引自TutorVista.comTM)
3. 密码子——从混沌走向秩序
4. ATP——遗传密码子的始作俑者
图7 叶绿素与细胞色素的血红素辅基之间在结构上的相似性。在从镁卟啉到铁卟啉的转变中发生了去环化作用(红色标记位置)。从进化上来看,膜耦联的叶绿素分子可能由磷脂与卟啉环加合而成。带箭头的蓝色虚线表示可能的演化方向
因此,密码子及其耦联的生化系统应该是在太阳光能驱动下、在相对封闭的系统(如脂质囊泡,可以推测原始地球上的脂肪酸就能自动形成类似于细胞膜的双层球状结构)中物理化学过程的产物,而这个过程的核心就是光能的捕获、传递以及由此推动的水的生化裂解过程,并伴随着跨膜电子与质子的流动或传递。在这样的事件之链中,一个看似平凡的有机分子——ATP却创造了诸多生命世界的奇迹:(a)它是光能转化成化学能的终端;(b)导演了一系列的生化循环(如卡尔文循环、糖酵解和三羧酸循环等)及令人眼花缭乱的元素重组;(c)它通过自身的转化与缩合将错综复杂的生命过程信息化——筛选出用4种碱基编码20多个氨基酸的三联体密码子系统(43 =64,还有相当大的编码冗余)、精巧地构建了一套遗传信息的保存、复制、转录和翻译以及多肽链的生产体系;(d)演绎出蛋白质与核酸互为因果的反馈体系,并在个体生存的方向性筛选中,构筑了对细胞内成百上千种同步发生的生化反应进行秩序化管控(自组织)的复杂体系与规则,并最终建立起个性生命的同质化传递机制——遗传(图8)。因此,在无数的有机分子中,只有ATP才是遗传密码子的始作俑者!
图8 密码子的起源——光合作用介导的ATP中心假说(ATP-centered hypothesis)示意图。蓝色虚线表示前生命期的演化过程,红色实线表示演化或作用从前生命期一直延续到生命期。箭头表示作用或影响方向。
而我们在现存生物中见到的情形却是,一种酶(氨基酰tRNA合成酶,aminoacyl-tRNAsynthetase,简称aaRS)在tRNA氨基酸臂上所携带的氨基酸与反密码子之间架起了桥梁,因为酶的空间扩展能力与柔性容易克服这样的困难。一般情况下,aaRS至少包含一个催化核心结构域(catalyticcentral domain, CCD)和一个结合反密码子的结构域(anticodon-bindingdomain, ABD)。tRNA的氨基酸接受茎(即3’-端CCA-OH),在aaRS的催化下,与经ATP活化的氨基酸通过脂键结合。携带同一个氨基酸的所有tRNA(也称为同功tRNA)由相同的aaRS所催化,而每种酶通过若干特殊碱基来识别同工的tRNA。已知aaRS与呈L型的tRNA的内侧面广泛结合(图9)。
图 9色氨酰tRNA合成酶(tryptophanyl-tRNAsynthetase)的晶体结构(来源:生物谷Bioon.com)
表1 标准遗传密码及其变异
密码子 |
通常编码 |
例外编码 |
所属生物 |
中止 |
色氨酸 |
人、牛和酵母的线粒体,支原体基因组 |
中止 |
半胱氨酸 |
一些纤毛虫细胞核基因组,如粘游仆属(Euplotes) |
精氨酸 |
中止 |
大部分动物线粒体,脊椎动物线粒体 |
精氨酸 |
丝氨酸 |
果蝇线粒体 |
异亮氨酸 |
蛋氨酸 |
一些动物和酵母线粒体 |
中止 |
谷氨酰胺 |
草履虫、一些纤毛虫细胞核基因组,如嗜热四膜虫(Thermophailus tetrahymena) |
中止 |
谷氨酸 |
草履虫核细胞核基因组 |
缬氨酸 |
丝氨酸 |
假丝酵母核基因组 |
赖氨酸 |
天冬氨酸 |
一些动物的线粒体,果蝇线粒体 |
亮氨酸 |
中止 |
圆柱念珠菌(Candida cylindracea)细胞核基因组 |
亮氨酸 |
苏氨酸 |
酵母线粒体 |
如果不能保存,信息有何意义?而保存(通过DNA)就是对适应性的一种镌刻,同样没有它,演化亦不复存在。正是这样一种神奇的机制使生命铸就了惊人的适应性。譬如,多肽链的柔变性为其高效的酶催化功能奠定了基础,而竞争性存在的选择压力使这种特性登峰造极:乳清酸核苷5'-磷酸脱羧酶所催化的反应在无酶情况下,需要7800万年才能将一半的底物转化为产物,而在这种脱羧酶的催化下,同样的反应过程只需要25毫秒(Radzicka andWolfenden 1995)。也就是说,酶的高效性依赖于竞争性生存的筛选,换言之,如果没有这种筛选力量,酶不可能如此高效。
或许,水的裂解是大自然适应性演化的最大奇迹。H2O中O-H键是一种稳定的共价键,平均键能高达463 kJ mol-1,在普通的太阳光照射下很难裂解。而生命用平凡的原料创造了神奇:光能、多肽、金属离子(锰、钙)和非金属离子(氯)。水氧化酶通过金属离子与水分子的非共价结合,将H2O的O-H键撕开,夺走电子,释放H+(铸就了永不衰竭的ATP合成)和O2(成全了动物界的繁荣)。这一化学事件的重要意义在于,生命在相对温和(普通的太阳光能下)的条件下找到了一种从普遍存在但难以裂解的H2O中获取电子和质子的方法,从此拉开了生命在地球上快速扩张的序幕。
Woese(1967)提出了RNA世界学说(RNA world hypothesis,该词由诺贝尔化学奖得主Walter Gilbert于1986年造出),认为地球上早期的生命分子以RNA先出现,之后才是DNA,早期的RNA分子同时拥有如同DNA的遗传信息储存功能,以及如蛋白质般的催化能力,支持了早期的细胞或前细胞生命的运作。笔者反对这种RNA无中创造出生命世界的说法,虽然认同RNA早于DNA出现的观点。有何证据说RNA具有信息储存功能和催化功能生命世界就一定起源于它?还有,该学说亦无法说明RNA分子为何要储存遗传信息以及为何要去支持前细胞生命的运作。
RNA世界学说是一个缺乏演化动因(亦缺乏个体性)的臆想,它对生命起源的解释相当牵强。笔者提出一个替代的名称——光合驱动的ATP世界学说(photosynthesis mediated ATP worldhypothesis),即以ATP为核心的前细胞生命运作起因于脂质囊泡结构中开始的光系统演化。只要细心剖析现存生命的生化代谢系统——膜耦联的光反应系统——遗传信息系统等的结构特征(光合色素、电子传递链、利用跨膜H+梯度合成ATP的ATPase 、DNA链的RNA引物等)及其关联性,一切就会幡然醒悟!不过,如果大家不愿舍弃RNA世界一词的话,那始作俑者也应该是核苷酸ATP,它才是RNA世界的真正推手!
对本文内容的引用:谢平. 2016. 进化理论之审读与重塑. 北京:科学出版社
Xie P. 2016. Critical Reviews and Reconstructionof Evolutionary Theories. Beijing: Science Press (in Chinese)
Cracking mystery of genetic codes
Ping Xie*
Donghu Experimental Station of LakeEcosystems, State Key Laboratory of Freshwater Ecology and Biotechnology ofChina, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan 430072,People’s Republic of China
* E-mail: xieping@ihb.ac.cn
More than half a century has passedsince the discovery of genetic codes, but their origin is still one of thegreatest mysteries in modern life science. Are the genetic codons really unknowable? Do they really require external design? Here, I present an ATP-centrichypothesis aimed at exploring the hidden primordial world inspiring the originof genetic codes. I examined how and why ATP is at the heart of the extantbiochemical system, and how the genetic codes came into being with theevolution of biochemical system driven by photosynthesis. It is challenging tocrack the mystery.
It is a miracle of nature that a set ofgenetic codes have assembled tens of millions of different species on theearth. But no one knows exactly how these genetic codes came into being. Biologists,probably a majority, hold a pessimistic view that an exact reconstruction ofthe process of code construction may never be possible [1]. Yockey [2] claimsthat the origin of the genetic code is unknowable, as there is no trace inphysics or chemistry of the control of chemical reactions by a sequence of anysort or of a code between sequences. He criticizes that many papers have beenpublished with titles indicating that their subject is the origin of thegenetic code, but actually the content deals only with its evolution.
So far, several hypotheses have been proposed. The frozenaccident hypothesis states that allocation of codons to amino acids in thesingle ancestor was entirely a matter of “chance”,and then remained unchanged [3]. The stereochemicalhypothesis claims that there is in many cases a specific stereochemical fitbetween the amino acids and the base sequence of its codon on the appropriatetRNA [4]. The biosynthetic hypothesis postulates that the code was assigned inparallel to the evolution of amino-acid biosynthesis [5]. Knight et al. [6] declaresthat the genetic code is a product of selection, history and chemistry. Sincethen, very little definitive progress has been made, although intensive studieshave focused on variation or flexibility of the codes and possible rules of codonallocations to amino acids [7-10].
Frankly speaking, these hypotheses suffer from twofatal defects: first, none can explain satisfactorily why the genetic codesevolved in such a way, and second, none has explained the origin of geneticcodes from that of the biochemical system (a relation of part to whole). Inother words, all these hypotheses completely overlooked the coevolution of thegenetic codes with biochemical system. In my view, it is impossible to crack thesecret of codon origin just from the codon itself [10, 11], even extendingvision to the possible relationship between codon and amino acids [7].
Another source of the problem may berelated to a so called quasi-species model proposed by Eigen [12]. He strongly advocated in vitro evolution of macromolecules. A quasi-species in theenvironment was imaged to be a population of genetically related RNA moleculeswhich had certain morphological commonness but were not identical. He supposedthat the quasi-species followed the Darwinian process of natural selection.This model has been highly influential [1], which, in my view, is among the mainobstacles in our understanding the origin of the geneticcodes.
There is an associated problem. Molecular biology hasbeen long troubled by the so-called chicken-and-egg paradox of protein and nucleic,seemingly a logically circular debate about who appeared first. It is alsocertain that the cues about the origin of macromolecules have been completelylost, due to cyclizing of biochemical paths where the transitional states or trackshad long disappeared. Just as the ancient Greek philosopher Heraclitus said,the beginning and ending point are overlapped on the circumference of a circle.
It is undeniable that codon and amino acids had beenlinked stereo-chemically [4]. Otherwise, we will fall into the mire of God'screation or design. The problem is that the birth of the genetic code must havebeen motivated by some forces. Randomness and selection were frequentlyconsidered to be the ones, but I do believe they are ways rather thanmotivation. Motivation should be energetic, e.g. ATP produced byphotosynthesis.
It is beyond all doubt that design of the geneticcodons, a chemical language, never required the intervention of a “hidden hand”.However, currently, no reliable fossil evidences are available, and eons ofevolution have blurred the molecular vestiges of the early events that remainin living organisms [13]. But fortunately, we can still look back at thehistory from the extant, even our eyes can perceive only a very minute fractionof the history of the early life.
How is a biochemical system organized?
To solve the puzzle of the genetic codes, we mustfirst understand how a biochemical system is organized. Consider what is alife. Life, compositionally, is a unity of matter, energy and information, anddynamically, is a “game” of material cycling, energy flows and informationcommunication. Energy is the key to support life system. In the past decades, physicistsand chemists have discovered a lot of details about how organisms are structuredand how they work. It is known that life is a physiological machine where inconceivablynumerous biochemical reactions are taking place toacquire, convert and use energy. In all modernautotrophic life forms (plants, algae and somebacteria), the only source of energy is from sunlight that is converted intochemical energy by a series of complex physico-chemical processes called asphotosynthesis [14].
Energetically and informatically, nothing inbiochemical system is more important than a nucleotide, ATP. It is the only energeticproduct of photosynthesis, carrying chemical energy converted from sunlight. Itthen provides energy for metabolisms through conversionsof ATP/ADP/AMP, supporting transformation of various biomolecules into eachother in an exquisitely organized cell. In other words, the major metabolicpathways (e.g. the Calvin cycle, glycolysis, and the Krebs) are all coupled withATP (Fig. 1).Of course, NAD(P)H,a derivative of nucleotide, is alsonecessary, as it transports H and e- (through conversion of NAD(P)H/NAD(P)+).
Figure1. ATP (a carrier of both energy and information) is at the center of the biochemicalsystem in a modern cell. Itprovides a unique bridge among photosynthesis, metabolic pathways and genetic information
ATP is not only an indispensablebuilding block of the genetic system (DNA, and RNA), but the other four nucleotides for genetic coding are also allderived from it. While, information delineates the border between the livingand the inanimate, as the living world appears as the only place whereinformation is recorded, processed, or used [15]. Therefore, as anirreplaceable carrier of both energy and information, ATP appears asnecessarily having a central role in the biochemical system.
The importance of ATP in biochemical systems couldbe attributed to its role in governing the evolution of photosynthetic systems inprimordial life. Although debated, there are signs that life on earth did startout with photosynthesis [16]. First, sunlight, needless to say, has been themost universal source of energy. Second, a biomolecule called cytochrome (an electrontransport protein with iron porphyrin or heme as a prosthetic group) seems tobe imprinted with photosynthesis (Fig. 2). Cytochromeis a universal electronic carrier, but present even in chemoautotrophicbacteria [17]. Originally, the heme was likely derivedfrom a photosynthetic pigment, chlorophyll, as their biosynthetic pathways arevery similar in extant life forms [18]. Interestingly, chlorophyll seemedan adduct between a magnesium porphyrin ring and along chain fatty acid from the membrane. Let me ask, if photosynthetic bacteriawere not the Last Universal Common Ancestor (LUCA) of all modern life forms, why had the chemoautotrophic bacteria used a photosynthesis-imprintedmolecule likecytochrome as an electron carrier?
Figure2. Structural homology between chlorophyll and the heme of cytochrome. Decyclization occurred from magnesiumporphyrin to iron porphyrin (marked with red color). Evolutionarily, the membrane-boundchlorophyll was likely a merge of phospholipid and porphyrin. Dashed blue lineswith arrows indicate possible directions of evolution
How did the genetic codes originate?
The genetic code system was built by achemical mechanism closely related closely with the production of ATP. Energetically,the first job the primordial life should do was how to achieve sufficientproduction of this nucleotide. While in present-day photoautotrophs, synthesisof ATP requires a transmembrane gradient of protons that come from biochemicalcleavage of water in photosynthetic system. Therefore, to guarantee such a protongradient, there was first needed a relatively closed entity or a small room, impermeableto H+. Such cubicles were most likely a lipid vesicle, the precursorof protocell. The fact that synthesis of ATP requires a transmembrane gradientof H+ made it impossible for macromolecules to evolve in vitro as suggested by Eigen.
Chemically, it was not impossible, onthe primitive earth, that fatty acids couldautomatically form a double-layered globular membrane structure [19]. In modern life forms, cell membrane, consisting of lipid (phospholipid)bilayer, provides controlled entry and exit ports for the exchange of matter. Itpermits passing through of small molecules such as CO2 and O2by diffusion, but acts as a barrier for certain molecules and ions (e.g. H+),leading to different concentrations on the two sides of the membrane. H+cannot pass freely across the membrane, unless using transmembrane proteinchannels. This implies that the formation of ATP in primordial cell had to relyon polypeptide channel that then developed into ATP synthase. In addition, H+from cleavage of H2O also needed the help of polypeptides.
Reasonably,fluent production of ATP was possible only if the various elements (sunlight,lipid bilayer membrane, polypeptides, cleavage of H2O, transmembraneH+ gradient, electron carrier……) had be organized orderly. It may beinferred that there were a series of events randomly occurring in protocells. Forexample, the transmembrane proton gradient coupled with polypeptide channelresulted in the formation of an apparatus (i.e. ATP synthase) to synthetize ATP.Nucleotides like ATP could form polynucleotides by self-condensation, some ofwhich carried amino acids (precursor of tRNA), and others built platform forthe synthesis of polypeptides (precursor of rRNA), which assumably replaced theirregular stochastic formation of polypeptides from amino acids activated by ATP.On the other hands, the polypeptides in turn participated not only in theconstruction of the transmembrane channels of hydrophilic molecules/ions, butalso in the biochemical cleavage of H2O, and in catalyzing self-condensationof nucleotides as well. As a consequence, numerous consecutive reactions werelinked into a variety of chains, which might be linear, branched, or cyclic. Fundamentally,these were creative processes of order out of disorder, and of rationality outof randomness.
Itshould be borne in mind that primitive life started out from materials thatwere not organized. The prebiotic mixtures of chemicals, present on theprimeval earth, were assumed to contain an immense number of life’s buildingblocks such as ATP, NADPH, pyrroles, amino acids, aliphatic hydrocarbons,ubiquinones, monosaccharides, and so on [16]. Let us first consider the impermeablelipid vesicle wrapping a plenty oflife'sbuilding blocks. Their exposure to sunlight would drive dazzling flows ofelectrons and H+, causing active re-combinations of elements. This wouldincrease accumulation of large molecules, but accompanied with ceaseless inputof small molecules like CO2……and therefore, the protocells had to reciprocatebetween enlarging and rupturing, giving rise to the developments of both photosynthesisand cell division. Next, an information era followed: in the cycle of photosyntheticgrowth followed by division, and also through the selection of individualsurvival, the protocell got used to fixiate regularity, reproducibility and rhythmicityof various biochemical reactions in dealing with photosynthetic products, whichcan be called information-based or informatization.
Then, a scenario of how a set of geneticcodes were successfully selected to record, preserve and transmit informationcan be outlined. Firstly, it was likely that in the organic “soup” enclosed bythe protocell, the energetic ATP with its derivatives could randomly extendchains of both polynucleotides and polypeptides, which made it possible toestablish or fix the chemical relation between sequences of nucleotides inpolynucleotides and amino acids in polypeptides from their numerous randomcombinations through selection of cellular survival, an ecological force or afeedback mechanism.
Secondly, informatization was inevitablycoupled with structuralization, such as structural subdivision orspecialization and functional differentiation, providing basis for theestablishment of the triplet codon system. For example, t-RNA was specializedto carry specific amino acid, polypeptides helped matching the acceptor stem oft-RNA to its anticodon, and the system developed the rule of codon-anticodonbase-pairing i.e. molecular recognition, through stereochemical interactions, e.g.hydrogen bonds, van der Waals attractive forces and aromatic stacking, and usheredin a unified platform, r-RNA, for protein synthesis (synthetizing polypeptidesaccording to m-RNA template), and so on. In this way, macromolecules werefunctionally differentiated, i.e., handling (record, preserve and transmit)information by polynucleotides and catalyzing all chemical reactions bypolypeptides called as enzymes, and both were further cyclized into a system ofreciprocal causation. As a result, the codes-based informatization led to fantasticinnovations of diverse laws (principles) or patterns, e.g., biochemicalpathways, of course through a goal-oriented random selection (Fig. 3).
Thirdly, the triplet codon might just bea result of the optimization of biochemical networks under selective pressure,i.e., for handling more than 20 amino acid, it was not so good either too much(more cumbersome) or too little (43=64, thus there is still considerable redundancy of encoding). Namely, triplet was the lowest codenumber to encode the amino acids.
Of cause, it must have been a very long successionof steps for the protocells to test and modify the genetic code system throughthe selection of cellular survival. Then, they obtained the capacity totransmit their blueprint recorded in DNA from generation to generation, andself-building also became a central characteristic of life. The protocell hadtherefore taken a historic step towards the first genetic cell, i.e. a truespecies, that could bear, process and transmit information, reproducinghomogeneous, although not absolutely, individuals, and thus capable of Darwinianevolution. It seemed to be a first principle that reproduction of individualswith hereditable homogeneity was favored by existence or nature, which -somewhat like a centripetal eco-physiological force - not only governed the integrationof various biochemical events, but also shaped the direction of survivalselection.
Figure 3. A simplified conceptual model on the origin of thegenetic code based on the photosynthesis-mediated and ATP-centric hypothesis. Dashed blue lines indicate evolutionaryprocesses during pre-life period, while solid red lines denote processes orinteractions from pre-life period to the present. Arrows indicate the directionof influences or actions.
What made the difference between DNA andRNA?
As ATP is the only nucleotide producedby photosynthesis, it did not appear difficult to convert ATP to any othernucleotide. Meanwhile, these energetic nucleotides could also self-condenseinto a wide variety of mRNA, of course also other RNAs. If they could do this,why was it impossible for them to extend the chain to make a DNA molecule? Theprogress from mRNA to DNA was undoubtedly a great step of the informatization inbiochemical system. That is, DNA was specialized to accurately record and permanentlypreserve all genetic information that command all that goes in the cell, whilemRNA became short-lived messengers for the implementation of the instructions.
However, in the integrative process ofinformation, there were subtle structural differences between DNA and RNA.First, the second carbon atom of the ribose connected -H in the former, but -OHin the latter. Second, thymine T in DNA was replaced by uracil U in RNA, ofcourse, the structural difference was very small, i.e., T had more than onemethyl group. No one can answer why base differences were 1 rather than 4. It seemsunbelievable that this had much meaning in structuralization, perhaps justbecause of a need in difference, as similar cases were not rare, e.g. NADPH andNADH, unless the diverse three-dimensional structures of RNA were due to such atiny change in ribose or base. Structurally, this was more likely just an accident.But functionally, subdivision of the genetic system into RNA and DNA was certainlyfor orderly management and control of information in a tiny cell where hundredsof biochemical reactions occur simultaneously. Consequently, the biochemicalsystem came to a status where mRNA were immediately destroyed after theirmission was finished, while the genetic information recorded by DNA would bepermanently preserved and transmitted to ensure the continuation of the species.
Who manipulated the RNA world?
Woese [20] proposed the RNA world hypothesis (a termcoined by Walter Gilbert in 1986): the earliest biomolecules on earth were RNA,and then DNA; the early RNA molecules had the abilities of information storageas DNA and catalysis as proteins, and supported the operation of the early cellor the protocell life. I object to such a statement that RNA created the lifeworld from randomness, although agreeing with the view that RNA was earlierthan DNA. Is there any evidence to say that the world of life must have beenderived from RNA as it has the functions of information storage and catalysis? Also,this theory is unable to explain why the RNA molecules tended to store geneticinformation and to support the operation of protocells.
In my view, the explanation of the RNA worldhypothesis on the origin of life and the genetic codes is quite farfetched [21-23]and its lack of both motivation and individuality is unfortunate. I propose analternative term, the photosynthesis mediated ATP world hypothesis, i.e., theearly life on earth coevolved with the development ofphotosynthetic system in lipid vesicles where a sophisticated biochemicalsystem was built for a special purpose to synthetize ATP using solar energy.This is evidenced by a series of structural and functional features of the extantbiochemical system (e.g., photosynthetic pigments, electron transfer chain, ATPsynthesis by transmembrane H+ gradient, metabolic cycles/pathscoupled with ATP, synthesis of RNA and DNA by ATP and its derivatives, etc.)aswell as their interplay. However, if we are not willing to abandon the term of“RNA world”, well, the nucleotide ATP should also be the initiator.
Challenges in the Future
In this article, I outlined a syntheticmechanism for the origin of genetic codes: the energetic ATP together with its derivativescould randomly extend chains of both polynucleotides and polypeptides, whichmade it possible to establish or fix chemical relations between sequences of nucleotidesin polynucleotides and amino acids in polypeptides from their numerous randomcombinations through a feedback mechanism (selection of cellular survival); andtechnically, photosynthesis, a goal-oriented process, enabled various biotic factorsor reactions (ATP, lipid vesicle, informatization, structuralization, homogenousindividual, individuality, survival, etc.) to be integrated into an operatingsystem of genetic codes. But, it remains a challenge how to collect supportingevidences.
Perhaps some people will ask why moleculesdid not stay silently in the organic soup but struggled to shift from chaos toorder? We may attribute this to code-based self-organization with incessantinput of solar energy: would such codes not exist, the protocells would not beable to maintain orderly control of biochemical systems, and the chemical worldwould still stay in incomprehensible chaos. But, it is still needed to explainwhy. Is it a first principle that individuals with heritablehomogeneitywere favored by existence?
Life, in a sense, is a contradictory unity– it owns generality from homogeneity, but in the meantime possesses individualityfrom heterogeneity. Individuality can be traced back to the lipid vesicleswhere life started out. It is the quality that makes one living entitydifferent from all others. It is amazing that such individuality had advancedfrom pure chemistry in bacteria to sophisticated desire (e.g. for competition andstruggle), habits, instincts and even spirit in higher animals. Is it possiblethat individuality, as an eco-physiological force, engaged in re-shaping or re-fixingrhythm or regularity of biochemical reactions in the protocells? And then, Darwinianevolution could be a matter of course.
One may well wonder whether too much speculationhas been superimposed on the ATP-centric hypothesis. But, I can confidently saythat the present sketch is clearer than any one of the previous theories in theoverall outline or logic, particularly, relating the origin of the genetic codewith the evolution of the photosynthesis-mediated and ATP centered biochemicalsystem. And, of course, I admit that all the hypotheses or theories on theorigin of life or genetic code can neither be verified nor falsified (this isnot science in the eyes of Popper, a philosopher of science). This has been thecase so far, and perhaps will still be for a period of time in the future. But,because it is a secret about ourselves, and before the facts are completely known,human beings never stop pursuing. It is possible for us to spy on the secretsof coevolution between the codon and the biochemical system from theirintrinsic relationship - this is not necessarily the truth, but at least a wayto the truth.
