Cells are the fundamental units of life. The genome sequence of a cell may be thought of as its operating system. It carries the code that specifies all of the genetic functions of the cell, which in turn determine the cellular chemistry, structure, replication, and other characteristics. Each genome contains instructions for universal functions that are common to all forms of life, as well as instructions that are specific to the particular species. The genome is dependent on the functions of the cell cytoplasm for its expression. In turn, the properties of the cytoplasm are determined by the instructions encoded in the genome. – Venter Institute, 2016
Sixty-three years ago Francis Crick wrote a letter to his 12 year-old son, Michael, and explained that he and his research-partner (James D. Watson) had constructed a model of how DNA molecules could hold encoded information inside the cell. The very concept of a molecule holding encoded information was scientifically and philosophically fascinating; we had discovered the “stuff of heredity”. Fifty-eight years ago we discovered the necessary "interpreter" molecules that allowed the translation of this encoded information into concrete physical effects. Without these interpreter molecules, the information contained in DNA would be completely useless. Life would simply not exist. Fifty-five years ago we demonstrated experimentally that the information contained in the genome was held in an actual reading-frame code. We readily recognized the utility of a reading-frame code, given that our own recorded language is the (only) other place in the cosmos we can find such a thing. In that same year, Marshall Nirenberg and Heinrich Matthaei began the process of finally breaking the code, the Genetic Code, which holds the information of life inside the cell.
And just last month in March 2016, documentation was presented that we have now built a minimal viable genome of just 438 protein-coding genes and 35 RNA scripts (equaling 531,000bp of information). This is a scant fraction of the 20,000+ genes required to organize a human being. These 473 genes include 324 genes of explicitly known function (mostly establishing and sustaining the functionality of the information system itself) and another 149 genes whose function is currently unknown, ambiguous or unassigned, but whose inclusion has been experimentally demonstrated to be necessary for robust viability.
This synthetic bacterium genome (coined JCVI-syn3.0) is a project of the J. Craig Venter Institute, where its genetic content was carefully reduced and optimized to survive only in a nutrient-rich environment that “supplies virtually all the small molecules required for life”. In this reduced state, the researchers can then catalog the truly essential genetic information and function of a minimal genome. As an example, if a gene is required to synthesize a particular nutrient, that gene would be removed from the genome and the required nutrient would then be provided to the organism by the laboratory environment itself. By making these types of strategic deletions, the team set out to establish the “core set of environment-independent functions that are necessary and sufficient for life.” And as a result of the process, JCVI-syn3.0 is now the smallest self-reproducing organism known to science.
Our goal is a cell so simple that we can determine the molecular and biological function of every gene.
JCVI began this project with their previous synthetic genome (JCVI-syn1.0, circa 2010) along with a best-approximation of a viable genome, which they referred to as their HMG (hypothetical minimal genome). With this knowledge in hand, they divided the genome into eight segments, such that each segment could be independently reorganized and reduced, then tested for viability. Over the course of the project, the researchers improved their processes and procedures, resulting in a viable genome that is half the size of JCVI syn-1.0.
The genetic information that remained was then analyzed and divided into four main categories of function: a) gene expression, b) membrane structure, c) metabolism, and d) genome preservation. Of these, the largest group (by a good margin) establishes the cell’s capacity to translate and express the genetic information itself. Along with the capacity to preserve this information, these two categories account for almost half (48%) of the viable genome (i.e. information expression=41%, information preservation=7%). The remaining two groups of function (membrane and metabolism) together account for another 35% of the minimal genome (i.e. membrane=18% and metabolism=17%). As a significant step in reaching the researcher’s goals, this leaves just 79 genes with fully uncharacterized function.
[B]ecause of the rich growth medium that supplies almost all of the necessary small molecules, many genes involved in transport, catabolism, proteolysis, and other metabolic processes have become dispensable. For example, because glucose is plentiful in the medium, most genes for transport and catabolism of other carbon sources have been deleted (34 out of 36), whereas all 15 genes involved in glucose transport and glycolysis have been retained.
In contrast, almost all of the genes involved in the machinery for reading and expressing the genetic information in the genome and in ensuring the preservation of genetic information across generations have been retained. The first of these two fundamental life processes, the expression of genetic information as proteins, requires the retention of 195 genes in the categories of transcription, regulation, RNA metabolism, translation, protein folding, RNA (rRNA, tRNA, and small RNAs), ribosome biogenesis, rRNA modification, and tRNA modification. The second of these two fundamental processes, the preservation of genome sequence information, requires the retention of 34 genes in the categories of DNA replication, DNA repair, DNA topology, DNA metabolism, chromosome segregation, and cell division.
These findings are entirely consistent with the argument presented on Biosemiosis.org. In a previous article I wrote:
“Thus, when we observe the particulars of the genetic translation system, we are not merely looking at features that happen to be coincidental to the system's function – instead, each individual feature we observe imparts a very specific capacity on the system, and each of these capacities are collectively necessary in making the organization of a heterogeneous cell possible. They are necessary because they make the translation of information possible. They make memory and heredity possible. And to whatever extent the origin of life required any additional information to organize the first living cell, we can know by virtue of life’s self-replicating nature that the original informational content of the heterogeneous cell contained at least enough information to replicate and organize the elements of the system described above.”
I recognize that nothing I say on this matter is earth-shattering; I’ve merely presented a model of well-known physical requirements. These include observations that have been documented and understood for half a century or more. But these requirements do not go away as the origins issue passes out of our empirical hands, onwards to our speculations about what might have started it all. This is simply to say, on the day before the first self-replicating heterogeneous cell existed on earth, every single one of the physical conditions required for the translation of information already existed. They are bound by physical law, and they must be resolved for the heterogeneous cell to come into being.
What JCVI has done, and is doing, is experimentally quantifying those requirements in terms of discrete function and numbers of base pairs. And this leads me to a couple of questions for those who profess (against massive physical evidence to the contrary) that this all came into being by naught (or whatever word you’d like to use).
Considering the list of functions that a minimal heterogeneous cell requires, at what point is translation – the organized expression of an informational medium -- not required inside the cell? The translation of an informational medium enables the physical capacity to specify a thing among alternatives, and places it under temporal control. That is precisely what protein synthesis does. Translation also allows the system to control and produce effects and outcomes that are not determined by (and therefore not limited by) the physical properties of the molecules carrying the information. This discontinuity is itself the product of a specific organization, and the independence it imparts upon the system is what enables the full range of effects required to organize the cell. When is this capacity to specify a thing and produce effects (unlimited by the physical properties of the medium) not necessary to the formation of the heterogeneous cell?
Finally, when translation is organized in a system that uses combinatorial permutations as the means of encoding information (i.e. uses spatially-oriented representations and a reading-frame code) it gains the informational capacity required to describe itself in a transcribable memory. When is this not necessary to the formation of a heterogeneous cell? In other words, on what empirical grounds are we to say that Craig Venter can scratch off “the translation of information” from the genome?
(if you catch my drift)
Research Article: Design and synthesis of a minimal bacterial genome.
Science 25 Mar 2016: Vol. 351, Issue 6280, DOI: 10.1126/science.aad6253
Image Credit: J Craig Venter Institute/NCMIR/Thomas Deerinck/Mark Ellisman
JC Venter Institute, Minimal Cell Project