GENETIC CODE, MOLECULAR CLONING & APPLICATIONS
CIRCULAR BACTERIAL CHROMOSOME
CIRCULAR BACTERIAL CHROMOSOME
A circular bacterial chromosome, showing DNA replication
proceeding bidirectionally, with two replication forks generated at the
"origin". Each half of the chromosome replicated by one replication
fork is called a "replichore".
Circular bacterial chromosomes are the bacterial chromosomes contained in a circular DNA
molecule.
Unlike the linear DNA of vertebrates, typical bacterial chromosomes contain circular
DNA.
Most bacterial chromosomes contain a
circular DNA molecule - there are no free ends to the DNA. Free ends would
otherwise create significant challenges to cells with respect to DNA
replication and stability. Cells that do contain chromosomes with
DNA ends, or telomeres
(most eukaryotes),
have acquired elaborate mechanisms to overcome these challenges. However, a
circular chromosome can provide other challenges for cells. After replication,
the two progeny circular chromosomes can sometimes remain interlinked or
tangled, and they must be resolved so that each cell inherits one complete copy
of the chromosome during cell division.
Replication of
a circular bacterial chromosome
Bacterial chromosome replication is
best understood in the well-studied bacteria Escherichia coli
and Bacillus subtilis. Chromosome replication
proceeds in three major stages: initiation, elongation and termination. The
initiation stage starts with the ordered assembly of "initiator"
proteins at the origin region of the chromosome, called oriC. These assembly
stages are regulated to ensure that chromosome replication occurs only once in
each cell cycle. During the elongation phase of replication, the enzymes that
were assembled at oriC during initiation proceed along each arm ("replichore")
of the chromosome, in opposite directions away from the oriC, replicating the
DNA to create two identical copies. This process is known as bidirectional
replication. The entire assembly of molecules involved in DNA replication on
each arm is called a "replisome." At the forefront of the replisome is a DNA helicase
that unwinds the two strands of DNA, creating a moving "replication
fork". The two unwound single strands of DNA serve as templates
for DNA
polymerase, which moves with the helicase (together with other
proteins) to synthesize a complementary copy of each strand. In this way, two
identical copies of the original DNA are created. Eventually, the two
replication forks moving around the circular chromosome meet in a specific zone
of the chromosome, approximately opposite oriC, called the terminus region. The
elongation enzymes then disassemble, and the two "daughter"
chromosomes are resolved before cell division is completed.
Initiation
The E. coli bacterial
replication origin, called oriC consists of DNA sequences
that are recognised by the DnaA protein, which is highly conserved amongst different bacterial
species. DnaA binding to the origin initiates the regulated recruitment of
other enzymes and proteins
that will eventually lead to the establishment of two complete replisomes for
bidirectional replication.
DNA sequence elements within oriC
that are important for its function include DnaA boxes, a 9-mer repeat with a
highly conserved consensus sequence 5' - TTATCCACA - 3', that are recognized by
the DnaA protein. DnaA protein plays a crucial role in the initiation of
chromosomal DNA replication. Bound to ATP, and with the assistance of bacterial
histone-like
proteins [HU] DnaA then unwinds an AT-rich region near the left boundary of oriC,
which carries three 13-mer motifs, and opens up the double-stranded DNA for entrance of other
replication proteins.
This region also contains four “GATC”
sequences that are recognized by DNA adenine methylase
(Dam), an enzyme that modifies the adenine base when this sequence is
unmethylated or hemimethylated. The methylation
of adenines
is important as it alters the conformation of DNA to promote strand separation,
and it appears that this region of oriC has a natural tendency to
unwind.
Elongation
When the replication fork moves around
the circle, a structure shaped like the Greek letter theta Ө is formed. John Cairns demonstrated the theta
structure of E. coli chromosomal replication in 1963, using an
innovative method to visualize DNA replication. In his experiment, he radioactively labeled the chromosome by
growing his cultures in a medium containing 3H-thymidine.
The nucleoside
base was incorporated uniformly into the bacterial chromosome. He then isolated
the chromosomes by lysing the cells gently and placed them on an electron micrograph (EM) grid which he exposed
to X-ray
film for two months. This Experiment clearly demonstrates the theta replication
model of circular bacterial chromosomes.
As described above, bacterial
chromosomal replication occurs in a bidirectional manner. This was first
demonstrated by specifically labelling replicating bacterial chromosomes with radioactive isotopes. The regions of DNA
undergoing replication during the experiment were then visualized by using autoradiography
and examining the developed film microscopically. This allowed the researchers
to see where replication was taking place. The first conclusive observations of
bidirectional replication were from studies of B. subtilis. Shortly after, the
E. coli chromosome was also shown to replicate bidirectionally.
The E. coli DNA polymerase III holoenzyme is a 900 kD
complex, possessing an essentially a dimeric
structure. Each monomeric unit has a catalytic core, a dimerization
subunit, and a processivity component. DNA Pol III uses one
set of its core subunits to synthesize the leading
strand continuously, while the other set of core subunits cycles
from one Okazaki fragment to the next on the looped lagging
strand. Leading strand synthesis begins with the synthesis of a
short RNA primer
at the replication origin by the enzyme Primase (DnaG protein).
Deoxynucleotides
are then added to this primer by a single DNA polymerase III dimer, in an
integrated complex with DnaB helicase. Leading strand synthesis then proceeds
continuously, while the DNA is concurrently unwound at the replication fork. In
contrast, lagging strand synthesis is accomplished in short Okazaki fragments.
First, an RNA primer is synthesized by primase, and, like that in leading
strand synthesis, DNA Pol III binds to the RNA primer and adds deoxyribonucleotides.
When the synthesis of an Okazaki
fragment has been completed, replication halts and the core subunits of DNA Pol
III dissociates from the β sliding clamp [B sliding clap is the processivity
subunit of DNA Pol III]. The RNA primer is remove and replaced with DNA by DNA
polymerase I [which also possesses proofreading exonuclease
activity] and the remaining nick is sealed by DNA ligase,
which then ligates these fragments to form the lagging strand.
Termination
Termination is the process of fusion of
replication forks and disassembly of the resplisomes to yield two separate and
complete DNA molecules. It occurs in the terminus
region, approximately opposite oriC on the chromosome. The terminus region
contains several DNA replication terminator sites, or "Ter" sites. A
special "replicaiton terminator" protein must be bound at the Ter
site for it to pause replication. Each Ter site has polarity of action, that
is, it will arrest a replication fork approaching the Ter site from one
direction, but will allow unimpeded fork movement through the Ter site from the
other direction. The arrangement of the Ter sites forms two opposed groups that
forces the two forks to meet each other within the region they span. This
arrangement is called the "replication fork trap."
Replication of the DNA separating the
opposing replication forks, leaves the completed chromosomes joined as ‘catenanes’
or topologically interlinked circles. The circles are not covalently linked,
but cannot be separated because they are interwound and each is covalently
closed. The catenated circles require the action of topoisomerases
to separate the circles [decatanation]. In E.coli, DNA topoisomerase IV plays
the major role in the separation of the catenated chromosomes, transiently
breaking both DNA strands of one chromosome and allowing the other chromosome
to pass through the break.
Genetic code
A series of codons in part of a messenger RNA
(mRNA) molecule. Each codon consists of three nucleotides,
usually representing a single amino acid. The nucleotides are abbreviated
with the letters A, U, G and C. This is mRNA, which uses U (uracil). DNA
uses T (thymine)
instead. This mRNA molecule will instruct a ribosome
to synthesize a protein according to this code.
The genetic code is the set of
rules by which information encoded in genetic material (DNA or mRNA sequences) is translated into proteins
(amino acid
sequences) by living cells.
The code defines how sequences of three
nucleotides,
called codons, specify which amino acid will be added next during protein
synthesis. With some exceptions, a three-nucleotide codon in a
nucleic acid sequence specifies a single amino acid. Because the vast majority
of genes
are encoded with exactly the same code, this particular code is often referred
to as the canonical or standard genetic code, or simply the genetic
code, though in fact there are many variant codes.
For example, protein synthesis in human mitochondria
relies on a genetic code that differs from the standard genetic code.
Not all genetic information is stored
using the genetic code. All organisms' DNA contains regulatory sequences,
intergenic segments, chromosomal structural areas, and other non-coding
DNA that can contribute greatly to phenotype.
Those elements operate under sets of rules that are distinct from the
codon-to-amino acid paradigm underlying the genetic code.
Discovery
The genetic code
After the structure of DNA was
discovered by James Watson and Francis Crick,
who used the experimental evidence of Maurice
Wilkins and Rosalind Franklin (among others), serious
efforts to understand the nature of the encoding of proteins began. George Gamow
postulated that a three-letter code must be employed to encode the 20 standard
amino acids used by living cells to encode proteins. With four different
nucleotides, a code of 2 nucleotides could only code for a maximum of 42
or 16 amino acids. A code of 3 nucleotides could code for a maximum of 43
or 64 amino acids.
The fact that codons consist of three
DNA bases was first demonstrated in the Crick, Brenner et al. experiment.
The first elucidation of a codon was done by Marshall Nirenberg and Heinrich J. Matthaei in 1961 at the National Institutes of Health. They used a
cell-free
system to translate a poly-uracil RNA sequence
(i.e., UUUUU...) and discovered that the polypeptide
that they had synthesized consisted of only the amino acid phenylalanine.
They thereby deduced that the codon UUU specified the amino acid phenylalanine.
This was followed by experiments in the laboratory of Severo Ochoa
demonstrating that the poly-adenine RNA sequence (AAAAA...) coded for the
polypeptide poly-lysine and that the poly-cytosine RNA sequence (CCCCC...)
coded for the polypeptide poly-proline. Therefore the codon AAA specified the
amino acid lysine, and the codon CCC specified the amino acid proline. Using
different copolymers most of the remaining codons were then determined.
Extending this work, Nirenberg and Philip Leder
revealed the triplet nature of the genetic code and allowed the codons of the
standard genetic code to be deciphered. In these experiments, various
combinations of mRNA
were passed through a filter that contained ribosomes,
the components of cells that translate RNA into protein. Unique
triplets promoted the binding of specific tRNAs to the ribosome. Leder and
Nirenberg were able to determine the sequences of 54 out of 64 codons in their
experiments.
Transfer of information via the genetic code
The genome of an organism
is inscribed in DNA,
or, in the case of some viruses, RNA. The portion of the genome that codes for a protein or an
RNA is called a gene.
Those genes that code for proteins are composed of tri-nucleotide units called codons,
each coding for a single amino acid. Each nucleotide sub-unit consists of a phosphate,
a deoxyribose
sugar, and one of the four nitrogenous nucleobases.
The purine
bases adenine
(A) and guanine
(G) are larger and consist of two aromatic rings. The pyrimidine
bases cytosine
(C) and thymine
(T) are smaller and consist of only one aromatic ring. In the double-helix
configuration, two strands of DNA are joined to each other by hydrogen bonds in
an arrangement known as base pairing. These bonds almost always form
between an adenine base on one strand and a thymine base on the other strand,
or between a cytosine base on one strand and a guanine base on the other. This
means that the number of A and T bases will be the same in a given double
helix, as will the number of G and C bases. In RNA, thymine (T) is replaced by uracil (U), and
the deoxyribose is substituted by ribose.
Each protein-coding gene is transcribed into a molecule of the related
polymer RNA. In prokaryotes, this RNA functions as messenger RNA
or mRNA; in eukaryotes, the transcript needs to be processed to produce a
mature mRNA. The mRNA is, in turn, translated on the ribosome
into an amino acid
chain or polypeptide.
The process of translation requires transfer RNAs
specific for individual amino acids with the amino acids covalently
attached to them, guanosine triphosphate as an energy
source, and a number of translation factors. tRNAs have anticodons
complementary to the codons in mRNA and can be "charged" covalently
with amino acids at their 3' terminal CCA ends. Individual tRNAs are charged
with specific amino acids by enzymes known as aminoacyl tRNA synthetases, which have
high specificity for both their cognate amino acids and tRNAs. The high
specificity of these enzymes is a major reason why the fidelity of protein
translation is maintained.
There are 4³ = 64 different codon
combinations possible with a triplet codon of three nucleotides; all 64 codons
are assigned for either amino acids or stop signals during translation. If, for
example, an RNA sequence UUUAAACCC is considered and the reading frame
starts with the first U (by convention, 5' to 3'),
there are three codons, namely, UUU, AAA, and CCC, each of which specifies one
amino acid. This RNA sequence will be translated into an amino acid sequence,
three amino acids long. A given amino acid may be encoded by between one and
six different codon sequences. A comparison may be made with computer
science, where the codon is similar to a word,
which is the standard "chunk" for handling data (like one amino acid
of a protein), and a nucleotide is similar to a bit, in that it is the
smallest unit.
The standard genetic code is shown in
the following tables. Table 1 shows what amino acid each of the 64 codons
specifies. Table 2 shows what codons specify each of the 20 standard amino
acids involved in translation. These are called forward and reverse codon
tables, respectively. For example, the codon AAU represents the amino acid asparagine,
and UGU and UGC represent cysteine (standard three-letter designations, Asn and Cys,
respectively).
RNA codon table
nonpolar
|
polar
|
basic
|
acidic
|
(stop codon)
|
2nd
base
|
|||||||||
U
|
C
|
A
|
G
|
||||||
1st
base
|
U
|
UUU
|
(Phe/F) Phenylalanine
|
UCU
|
(Ser/S) Serine
|
UAU
|
(Tyr/Y) Tyrosine
|
UGU
|
(Cys/C) Cysteine
|
UUC
|
(Phe/F) Phenylalanine
|
UCC
|
(Ser/S) Serine
|
UAC
|
(Tyr/Y) Tyrosine
|
UGC
|
(Cys/C) Cysteine
|
||
UUA
|
(Leu/L) Leucine
|
UCA
|
(Ser/S) Serine
|
UAA
|
Stop (Ochre)
|
UGA
|
Stop (Opal)
|
||
UUG
|
(Leu/L) Leucine
|
UCG
|
(Ser/S) Serine
|
UAG
|
Stop (Amber)
|
UGG
|
(Trp/W) Tryptophan
|
||
C
|
CUU
|
(Leu/L) Leucine
|
CCU
|
(Pro/P) Proline
|
CAU
|
(His/H) Histidine
|
CGU
|
(Arg/R) Arginine
|
|
CUC
|
(Leu/L) Leucine
|
CCC
|
(Pro/P) Proline
|
CAC
|
(His/H) Histidine
|
CGC
|
(Arg/R) Arginine
|
||
CUA
|
(Leu/L) Leucine
|
CCA
|
(Pro/P) Proline
|
CAA
|
(Gln/Q) Glutamine
|
CGA
|
(Arg/R) Arginine
|
||
CUG
|
(Leu/L) Leucine
|
CCG
|
(Pro/P) Proline
|
CAG
|
(Gln/Q) Glutamine
|
CGG
|
(Arg/R) Arginine
|
||
A
|
AUU
|
(Ile/I) Isoleucine
|
ACU
|
(Thr/T) Threonine
|
AAU
|
(Asn/N) Asparagine
|
AGU
|
(Ser/S) Serine
|
|
AUC
|
(Ile/I) Isoleucine
|
ACC
|
(Thr/T) Threonine
|
AAC
|
(Asn/N) Asparagine
|
AGC
|
(Ser/S) Serine
|
||
AUA
|
(Ile/I) Isoleucine
|
ACA
|
(Thr/T) Threonine
|
AAA
|
(Lys/K) Lysine
|
AGA
|
(Arg/R) Arginine
|
||
AUG[A]
|
(Met/M) Methionine
|
ACG
|
(Thr/T) Threonine
|
AAG
|
(Lys/K) Lysine
|
AGG
|
(Arg/R) Arginine
|
||
G
|
GUU
|
(Val/V) Valine
|
GCU
|
(Ala/A) Alanine
|
GAU
|
(Asp/D) Aspartic
acid
|
GGU
|
(Gly/G) Glycine
|
|
GUC
|
(Val/V) Valine
|
GCC
|
(Ala/A) Alanine
|
GAC
|
(Asp/D) Aspartic acid
|
GGC
|
(Gly/G) Glycine
|
||
GUA
|
(Val/V) Valine
|
GCA
|
(Ala/A) Alanine
|
GAA
|
(Glu/E) Glutamic
acid
|
GGA
|
(Gly/G) Glycine
|
||
GUG
|
(Val/V) Valine
|
GCG
|
(Ala/A) Alanine
|
GAG
|
(Glu/E) Glutamic acid
|
GGG
|
(Gly/G) Glycine
|
A
The codon AUG both codes for methionine and serves as an initiation site: the
first AUG in an mRNA's
coding region is where translation into protein begins.[9]
Inverse
table
|
|||
Ala/A
|
GCU, GCC, GCA, GCG
|
Leu/L
|
UUA, UUG, CUU, CUC, CUA, CUG
|
Arg/R
|
CGU, CGC, CGA, CGG, AGA, AGG
|
Lys/K
|
AAA, AAG
|
Asn/N
|
AAU, AAC
|
Met/M
|
AUG
|
Asp/D
|
GAU, GAC
|
Phe/F
|
UUU, UUC
|
Cys/C
|
UGU, UGC
|
Pro/P
|
CCU, CCC, CCA, CCG
|
Gln/Q
|
CAA, CAG
|
Ser/S
|
UCU, UCC, UCA, UCG, AGU, AGC
|
Glu/E
|
GAA, GAG
|
Thr/T
|
ACU, ACC, ACA, ACG
|
Gly/G
|
GGU, GGC, GGA, GGG
|
Trp/W
|
UGG
|
His/H
|
CAU, CAC
|
Tyr/Y
|
UAU, UAC
|
Ile/I
|
AUU, AUC, AUA
|
Val/V
|
GUU, GUC, GUA, GUG
|
START
|
AUG
|
STOP
|
UAA, UGA, UAG
|
DNA codon table
Salient features
Sequence reading frame
A codon is defined by the
initial nucleotide from which translation starts. For example, the string
GGGAAACCC, if read from the first position, contains the codons GGG, AAA, and
CCC; and, if read from the second position, it contains the codons GGA and AAC;
if read starting from the third position, GAA and ACC. Every sequence can,
thus, be read in three reading frames, each of which will produce a
different amino acid sequence (in the given example, Gly-Lys-Pro, Gly-Asn, or
Glu-Thr, respectively). With double-stranded DNA, there are six possible reading
frames, three in the forward orientation on one strand and three
reverse on the opposite strand. The actual frame in which a protein sequence is
translated is defined by a start codon, usually the first AUG codon in the
mRNA sequence.
Start/stop codons
Translation starts with a chain initiation
codon (start codon). Unlike stop codons, the codon alone is not
sufficient to begin the process. Nearby sequences (such as the Shine-Dalgarno
sequence in E. coli)
and initiation factors are also required to start
translation. The most common start codon is AUG, which is read as methionine
or, in bacteria, as formylmethionine. Alternative start codons (depending on
the organism), include "GUG" or "UUG"; these codons
normally represent valine and leucine, respectively, but, as a start codon,
they are translated as methionine or formylmethionine.
The three stop codons
have been given names: UAG is amber, UGA is opal (sometimes also
called umber), and UAA is ochre. "Amber" was named by
discoverers Richard Epstein and Charles Steinberg after their friend Harris
Bernstein, whose last name means "amber" in German. The other two
stop codons were named "ochre" and "opal" in order to keep
the "color names" theme. Stop codons are also called
"termination" or "nonsense" codons. They signal release of
the nascent polypeptide from the ribosome because there is no cognate tRNA that
has anticodons complementary to these stop signals, and so a release
factor binds to the ribosome instead.
Effect of mutations
During the process of DNA
replication, errors occasionally occur in the polymerization of the
second strand. These errors, called mutations,
can have an impact on the phenotype of an organism, especially if they occur
within the protein coding sequence of a gene. Error rates are usually very
low—1 error in every 10–100 million bases—due to the
"proofreading" ability of DNA
polymerases.
Missense
mutations and nonsense
mutations are examples of point
mutations, which can cause genetic diseases such as sickle-cell disease and thalassemia
respectively. Clinically important missense mutations generally change the
properties of the coded amino acid residue between being basic, acidic polar or
non-polar, whereas nonsense mutations result in a stop codon.
Mutations that disrupt the reading
frame sequence by indels
(insertions
or deletions) of a non-multiple of 3 nucleotide
bases are known as frameshift mutations. These mutations usually
result in a completely different translation from the original, and are also
very likely to cause a stop codon to be read, which truncates the
creation of the protein. These mutations may impair the function of the
resulting protein, and are thus rare in in vivo
protein-coding sequences. One reason inheritance of frameshift mutations is
rare is that, if the protein being translated is essential for growth under the
selective pressures the organism faces, absence of a functional protein may
cause death before the organism is viable. Frameshift mutations may result in
severe genetic diseases such as Tay-Sachs
disease.
Although most mutations that change
protein sequences are harmful or neutral, some mutations have a positive effect
on an organism. These mutations may enable the mutant organism to withstand
particular environmental stresses better than wild-type
organisms, or reproduce more quickly. In these cases a mutation will tend to
become more common in a population through natural
selection. Viruses
that use RNA
as their genetic material have rapid mutation rates, which can be an advantage,
since these viruses will evolve constantly and rapidly, and thus evade the
defensive responses of e.g. the human immune system.
In large populations of asexually reproducing organisms, for example, E.
coli, multiple beneficial mutations may co-occur. This phenomenon is called
clonal interference and causes competition
among the mutations.
Degeneracy
Degeneracy is the redundancy of the
genetic code. The genetic code has redundancy but no ambiguity. For example,
although codons GAA and GAG both specify glutamic acid (redundancy), neither of
them specifies any other amino acid (no ambiguity). The codons encoding one
amino acid may differ in any of their three positions. For example the amino
acid glutamic acid
is specified by GAA and GAG codons (difference in the third position), the
amino acid leucine
is specified by UUA, UUG, CUU, CUC, CUA, CUG codons (difference in the first or
third position), while the amino acid serine is
specified by UCA, UCG, UCC, UCU, AGU, AGC (difference in the first, second, or
third position).
A position of a codon is said to be a
fourfold degenerate site if any nucleotide at this position specifies the same
amino acid. For example, the third position of the glycine
codons (GGA, GGG, GGC, GGU) is a fourfold degenerate site, because all
nucleotide substitutions at this site are synonymous; i.e., they do not change
the amino acid. Only the third positions of some codons may be fourfold
degenerate. A position of a codon is said to be a twofold degenerate site if
only two of four possible nucleotides at this position specify the same amino
acid. For example, the third position of the glutamic acid
codons (GAA, GAG) is a twofold degenerate site. In twofold degenerate sites,
the equivalent nucleotides are always either two purines (A/G)
or two pyrimidines
(C/U), so only transversional substitutions (purine to pyrimidine or pyrimidine
to purine) in twofold degenerate sites are nonsynonymous. A position of a codon
is said to be a non-degenerate site if any mutation at this position results in
amino acid substitution. There is only one threefold degenerate site where
changing to three of the four nucleotides may have no effect on the amino acid
(depending on what it is changed to), while changing to the fourth possible
nucleotide always results in an amino acid substitution. This is the third
position of an isoleucine codon: AUU, AUC, or AUA all encode isoleucine, but
AUG encodes methionine.
In computation this position is often treated as a twofold degenerate site.
There are three amino acids encoded by
six different codons: serine, leucine, and arginine. Only two amino acids are specified by a single
codon. One of these is the amino-acid methionine,
specified by the codon AUG, which also specifies the start of translation; the
other is tryptophan,
specified by the codon UGG. The degeneracy of the genetic code is what accounts
for the existence of synonymous mutations.
Degeneracy results because there are
more codons than encodable amino acids. For example, if there were two bases
per codon, then only 16 amino acids could be coded for (4²=16). Because at
least 21 codes are required (20 amino acids plus stop), and the next largest
number of bases is three, then 4³ gives 64 possible codons, meaning that some
degeneracy must exist.
These properties of the genetic code
make it more fault-tolerant for point
mutations. For example, in theory, fourfold degenerate codons can
tolerate any point mutation at the third position, although codon usage
bias restricts this in practice in many organisms; twofold
degenerate codons can tolerate one out of the three possible point mutations at
the third position. Since transition mutations (purine to purine or
pyrimidine to pyrimidine mutations) are more likely than transversion
(purine to pyrimidine or vice-versa) mutations, the equivalence of purines or
that of pyrimidines at twofold degenerate sites adds a further fault-tolerance.
Despite the redundancy of the genetic
code, single-point mutations can still cause dysfunctional proteins. For example,
a mutated hemoglobin
gene causes sickle-cell disease. In the mutant hemoglobin,
a hydrophilic glutamate
(Glu) is substituted by the hydrophobic valine (Val);
that is, GAA or GAG becomes GUA or GUG. The substitution of glutamate by valine
reduces the solubility of β-globin, which causes hemoglobin
to form linear polymers linked by the hydrophobic interaction between the
valine groups, causing sickle-cell deformation of erythrocytes.
In gneral, sickle-cell disease is not caused by a de novo mutation.
It is, rather, selected for in geographic regions where malaria
is common (in a way similar to thalassemia),
as heterozygous
people have some resistance to the malarial Plasmodium
parasite (heterozygote advantage).
These variable codes for amino acids
are allowed because of modified bases in the first base of the anticodon
of the tRNA, and the base-pair formed is called a wobble base
pair. The modified bases include inosine
and the Non-Watson-Crick U-G basepair.
Variations to the standard genetic code
While slight variations on the standard
code had been predicted earlier, none were discovered until 1979, when
researchers studying human mitochondrial genes discovered they
used an alternative code. Many slight variants have been discovered since then,
including various alternative mitochondrial codes, and small variants such as
translation of the codon UGA as tryptophan in the species Mycoplasma
and translation of CUG as a serine rather than a leucine in the genus Candida.
In bacteria
and archaea,
GUG and UUG are common start codons, but in rare cases, certain proteins may
use alternative start codons not normally used by that species.
In certain proteins, non-standard amino
acids are substituted for standard stop codons, depending on associated signal
sequences in the messenger RNA. For example, UGA can code for selenocysteine,
and UAG can code for pyrrolysine. Selenocysteine is now viewed as
the 21st amino acid, and pyrrolysine is viewed as the 22nd.
Despite these differences, all known
naturally-occurring codes are very similar to each other, and the coding
mechanism is the same for all organisms: three-base codons, tRNA, ribosomes,
reading the code in the same direction and translating the code three letters
at a time into sequences of amino acids.
Expanded genetic code
Since 2001, 40 non-natural amino acids
have been added into protein by creating a unique codon (recoding) and a
corresponding transfer-RNA:aminoacyl – tRNA-synthetase pair to encode it with
diverse physicochemical and biological properties in order to be used as a tool
to exploring protein structure and function or to create
novel or enhanced proteins.
Origin
Despite the minor variations that
exist, the genetic code used by all known forms of life is nearly universal.
However, there is a huge number of possible genetic codes. If amino acids are
randomly associated with triplet codons, there will be 1.5 x 1084
possible genetic codes.
Phylogenetic analysis of transfer RNA
suggests that tRNA molecules evolved before the present set of aminoacyl-tRNA synthetases.
In theory, the genetic code could be
completely random (a "frozen accident"), completely non-random
(optimal) or a combination of random and nonrandom. There are enough data to
refute the first possibility. For a start, a quick view on the table of the
genetic code shows a clustering of amino acid assignments. Furthermore, amino
acids that share the same biosynthetic pathway tend to have the same first base
in their codons, and amino acids with similar physical properties tend to have
similar codons.
There are four themes running through
the many theories about the evolution of the genetic code (and hence the origin
of these patterns):
- Chemical principles govern
specific RNA interaction with amino acids. Experiments with aptamers
showed that some amino acids have a selective chemical affinity for the
base triplets that code for them. Recent experiments show that of the 8
amino acids tested, 6 show some RNA triplet-amino acid association. This
has been called the stereochemical code. The stereochemical code could
have created an ancient core of assignments. The current complex
translation mechanism involving tRNA and associated enzymes may be a later development,
and maybe protein sequences were directly templated on base sequences.
- Biosynthetic expansion. The
standard modern genetic code grew from a simpler earlier code through a
process of "biosynthetic expansion". Here the idea is that
primordial life "discovered" new amino acids (for example, as
by-products of metabolism) and later incorporated some of
these into the machinery of genetic coding. Although much circumstantial
evidence has been found to suggest that fewer different amino acids were
used in the past than today, precise and detailed hypotheses about which
amino acids entered the code in what order have proved far more
controversial.
- Natural selection has led
to codon assignments of the genetic code that minimize the effects of mutations.
A recent hypothesis suggests that the triplet code was derived from codes
that used longer than triplet codons (such as quadruplet codons). Longer
than triplet decoding would have higher degree of codon redundancy and
would be more error resistant than the triplet decoding. This feature
could allow accurate decoding in the absence of highly complex
translational machinery such as the ribosome
and prior to the time when cells began making ribosomes.
- Information channels: Information-theoretic approaches see
the genetic code as an error-prone information channel. The inherent noise
(that is, errors) in the channel poses the organism with a fundamental
question: how to construct a genetic code that can withstand the impact of
noise while accurately and efficiently translating information? These “rate-distortion” models suggest that
the genetic code originated as a result of the interplay of the three
conflicting evolutionary forces: the needs for diverse amino-acids, for
error-tolerance and for minimal cost of resources. The code emerges at a
coding transition when the mapping of codons to amino-acids becomes
nonrandom. The emergence of the code is governed by the topology defined
by the probable errors and is related to the map coloring problem.
Molecular cloning
Molecular cloning refers to a
set of experimental methods in molecular
biology that are used to assemble recombinant
DNA molecules and to direct their replication
within host organisms. The use of the word cloning
refers to the fact that the method involves the replication of a single DNA
molecule starting from a single living cell to generate a large population of
cells containing identical DNA molecules. Molecular cloning generally uses DNA
sequences from two different organisms: the species that is the source of the
DNA to be cloned, and the species that will serve as the living host
for replication of the recombinant DNA. Molecular cloning methods are
central to many contemporary areas of modern biology and medicine.
In a conventional molecular cloning
experiment, the DNA to be cloned is obtained from an organism of interest, then
treated with enzymes in the test tube to generate smaller DNA fragments.
Subsequently, these fragments are then combined with vector DNA to generate recombinant DNA
molecules. The recombinant DNA is then introduced into a host organism. This
will generate a population of organisms in which recombinant DNA molecules are
replicated along with the host DNA. Because they contain foreign DNA fragments,
these are transgenic
or genetically-modified microorganisms (GMO). This process takes advantage of the
fact that a single bacterial cell can be induced to take up and replicate a
single recombinant DNA molecule. This single cell can then be expanded
exponentially to generate a large amount of bacteria, each of which contain
copies of the original recombinant molecule. Thus, both the resulting bacterial
population, and the recombinant DNA molecule, are commonly referred to as
"clones". Strictly speaking, recombinant DNA refers to DNA
molecules, while molecular cloning refers to the experimental methods
used to assemble them.
History of molecular cloning
Prior to the 1970s, our understanding
of genetics and molecular biology was severely hampered by an inability to
isolate and study individual genes from complex organisms. This changed
dramatically with the advent of molecular cloning methods. Microbiologists,
seeking to understand the molecular mechanisms through which bacteria
restricted the growth of bacteriophage, isolated restriction endonucleases, enzymes that could
cleave DNA molecules only when specific DNA sequences were encountered. They
showed that restriction enzymes cleaved chromosome-length DNA molecules at
specific locations, and that specific sections of the larger molecule could be
purified by size fractionation. Using a second enzyme, DNA ligase,
fragments generated by restriction enzymes could be joined in new combinations,
termed recombinant DNA. By recombining DNA segments of
interest with vector DNA, such as bacteriophage or plasmids, which naturally
replicate inside bacteria, large quantities of purified recombinant DNA
molecules could be produced in bacterial cultures. The first recombinant DNA
molecules were generated and studied in 1972.
Molecular cloning takes advantage of
the fact that the chemical structure of DNA is fundamentally the
same in all living organisms. Therefore, if any segment of DNA from any
organism is inserted into a DNA segment containing the molecular sequences
required for DNA replication, and the resulting recombinant
DNA is introduced into the organism from which the replication
sequences were obtained, then the foreign DNA will be replicated along with the
host cell's DNA in the transgenic organism.
Molecular cloning is similar to polymerase chain reaction (PCR) in that it
permits the replication of a specific DNA sequence. The fundamental difference
between the two methods is that molecular cloning involves replication of the
DNA in a living microorganism, while PCR replicates DNA in an in vitro
solution, free of living cells.
Steps in molecular cloning
In standard molecular cloning
experiments, the cloning of any DNA fragment essentially involves seven steps:
(1) Choice of host organism and cloning vector, (2) Preparation of vector DNA,
(3) Preparation of DNA to be cloned, (4) Creation of recombinant DNA, (5)
Introduction of recombinant DNA into host organism, (6) Selection of organisms
containing recombinant DNA, (7) Screening for clones with desired DNA inserts
and biological properties.
Choice of host organism and cloning vector
Although a very large number of host
organisms and molecular cloning vectors are in use, the great majority of
molecular cloning experiments begin with a laboratory strain of the bacterium E.
coli and a plasmid cloning vector. E. coli and
plasmid vectors are in common use because they are technically sophisticated,
versatile, widely available, and offer rapid growth of recombinant organisms
with minimal equipment. If the DNA to be cloned is exceptionally large
(hundreds of thousands to millions of base pairs), then a bacterial artificial chromosome
or yeast artificial chromosome vector is
often chosen.
Specialized applications may call for
specialized host-vector systems. For example, if the experimentalists wish to
harvest a particular protein from the recombinant organism, then an expression
vector is chosen that contains appropriate signals for transcription
and translation in the desired host organism. Alternatively, if replication of
the DNA in different species is desired (for example transfer of DNA from
bacteria to plants), then a multiple host range vector (also termed shuttle
vector) may be selected. In practice, however, specialized molecular
cloning experiments usually begin with cloning into a bacterial plasmid,
followed by sub-cloning
into a specialized vector.
Whatever combination of host and vector
are used, the vector almost always contains four DNA segments that are
critically important to its function and experimental utility--(1) an origin of
DNA replication is necessary for the vector (and recombinant sequences linked
to it) to replicate inside the host organism, (2) one or more unique
restriction endonuclease recognition sites that serves as sites where foreign
DNA may be introduced, (3) a selectable genetic marker gene that can be used to
enable the survival of cells that have taken up vector sequences, and (4) an
additional gene that can be used for screening which cells contain foreign DNA.
Preparation of vector DNA
The cloning vector is treated with a
restriction endonuclease to cleave the DNA at the site where foreign DNA will
be inserted. The restriction enzyme is chosen to generate a configuration at
the cleavage site that is compatible with that at the ends of the foreign DNA.
Typically, this is done by cleaving the vector DNA and foreign DNA with the
same restriction enzyme, for example EcoRI. Most modern vectors
contain a variety of convenient cleavage sites that are unique within the
vector molecule (so that the vector can only be cleaved at a single site) and
is located within a gene (frequently beta-galactosidase)
whose inactivation can be used to distinguish recombinant from non-recombinant
organisms at a later step in the process. To improve the ratio of recombinant
to non-recombinant organisms, the cleaved vector may be treated with an enzyme
(alkaline phosphatase) that modifies the
vector ends in such a way that it cannot replicate within cells unless it
contains foreign DNA.
Preparation of DNA to be cloned
For cloning of genomic DNA, the DNA to
be cloned is extracted from the organism of interest. Virtually any tissue
source can be used (even tissues from extinct animals,
as long as the DNA is not extensively degraded. The DNA is then purified using
simple methods to remove contaminating proteins (extraction with phenol), RNA
(ribonuclease) and smaller molecules (precipitation and/or chromatography). Polymerase chain reaction (PCR) methods
are often used for amplification of specific DNA or RNA (RT-PCR)
sequences prior to molecular cloning.
DNA for cloning experiments may also be
obtained from RNA using reverse transcriptase (complementary
DNA or cDNA cloning), or in the form of synthetic DNA (artificial gene synthesis). cDNA cloning
is usually used to obtain clones representative of the mRNA population of the
cells of interest, while synthetic DNA is used to obtain any precise sequence
defined by the designer.
The purified DNA is then treated with a
restriction enzyme to generate fragments with ends capable of being linked to those
of the vector. If necessary, short double-stranded segments of DNA containing
desired restriction sites may be added to create end structures that are
compatible with the vector.
Creation of recombinant DNA with DNA ligase
The creation of recombinant DNA is in
many ways the simplest step of the molecular cloning process. DNA prepared from
the vector and foreign source are simply mixed together at appropriate
concentrations and exposed to an enzyme (DNA ligase)
that covalently links the ends together. This joining reaction is often termed ligation.
The resulting DNA mixture containing randomly joined ends is then ready for
introduction into the host organism.
DNA ligase only recognizes and acts on
the ends of linear DNA molecules, usually resulting a complex mixture of DNA
molecules with randomly joined ends. The desired products (vector DNA
covalently linked to foreign DNA) will be present, but other sequences (e.g.
foreign DNA linked to itself, vector DNA linked to itself and higher-order
combinations of vector and foreign DNA) are also usually present. This complex
mixture is sorted out in subsequent steps of the cloning process, after the DNA
mixture is introduced into cells.
Introduction of recombinant DNA into host organism
The DNA mixture, previously manipulated
in vitro, is moved back into a living cell, referred to as the host organism.
The methods used to get DNA into cells are varied, and the name applied to this
step in the molecular cloning process will often depend upon the experimental
method that is chosen (e.g. transformation, transduction, transfection,
electroporation).
When microorganisms are able to take up
and replicate DNA from their local environment, the process is termed transformation, and cells that are in a
physiological state such that they can take up DNA are said to be competent. In mammalian cell culture, the
analogous process of introducing DNA into cells is commonly termed transfection.
Both transformation and transfection usually require preparation of the cells
through a special growth regime and chemical treatment process that will vary
with the specific species and cell types that are used.
Electroporation
uses high voltage electrical pulses to translocate DNA across the cell membrane
(and cell wall, if present). In contrast, transduction involves the packaging of DNA
into virus-derived particles, and using these virus-like particles to introduce
the encapsulated DNA into the cell through a process resembling viral
infection. Although electroporation and transduction are highly specialized
methods, they may be the most efficient methods to move DNA into cells.
Selection of organisms containing vector sequences
Whichever method is used, the
introduction of recombinant DNA into the chosen host organism is usually a low
efficiency process; that is, only a small fraction of the cells will actually take
up DNA. Experimental scientists deal with this issue through a step of
artificial genetic selection, in which cells that have not taken up DNA are
selectively killed, and only those cells that can actively replicate DNA
containing the selectable marker gene encoded by the vector are able to
survive.
When bacterial cells are used as host
organisms, the selectable marker is usually a gene that
confers resistance to an antibiotic that would otherwise kill the cells, typically ampicillin.
Cells harboring the vector will survive when exposed to the antibiotic, while
those that have failed to take up vector sequences will die. When mammalian
cells (e.g. human or mouse cells) are used, a similar strategy is used, except
that the marker gene confers resistance to the antibiotic Geneticin.
Screening for clones with desired DNA inserts and
biological properties
Modern bacterial cloning vectors (e.g. pUC19 and later
derivatives including the pGEM vectors) use the blue-white
screening system to distinguish colonies (clones) of transgenic
cells from those that contain the parental vector (i.e. vector DNA with no
recombinant sequence inserted). In these vectors, foreign DNA is inserted into
a sequence that encodes an essential part of beta-galactosidase, an enzyme whose activity
results in formation of a blue-colored colony on the culture medium that is
used for this work. Insertion of the foreign DNA into the beta-galactosidase
coding sequence disables the function of the enzyme, so that colonies
containing recombinant plasmids remain colorless (white). Therefore,
experimentalists are easily able to identify and conduct further studies on
transgenic bacterial clones, while ignoring those that do not contain
recombinant DNA.
The total population of individual
clones obtained in a molecular cloning experiment is often termed a DNA library.
Libraries may be highly complex (as when cloning complete genomic DNA from an
organism) or relatively simple (as when moving a previously-cloned DNA fragment
into a different plasmid), but it is almost always necessary to examine a
number of different clones to be sure that the desired DNA construct is
obtained. This may be accomplished through a very wide range of experimental
methods, including the use of nucleic acid
hybridizations, antibody probes, polymerase chain reaction, restriction fragment analysis
and/or DNA sequencing.
Applications of molecular cloning
Molecular cloning provides scientists
with an essentially unlimited quantity of any individual DNA segments derived
from any genome. This material can be used for a wide range of purposes,
including those in both basic and applied biological science. A few of the more
important applications are summarized here.
Genome organization and gene expression
Molecular cloning has led directly to
the elucidation of the complete DNA sequence of the genomes of a very large
number of species and to an exploration of genetic diversity within individual
species, work that has been done mostly by determining the DNA sequence of
large numbers of randomly cloned fragments of the genome, and assembling the
overlapping sequences.
At the level of individual genes,
molecular clones are used to generate probes
that are used for examining how genes are expressed, and how that expression is
related to other processes in biology, including the metabolic environment,
extracellular signals, development, learning, senescence and cell death. Cloned
genes can also provide tools to examine the biological function and importance
of individual genes, by allowing investigators to inactivate
the genes, or make more subtle mutations using regional mutagenesis or site-directed mutagenesis.
Production of recombinant proteins
Obtaining the molecular clone of a gene
can lead to the development of organisms that produce the protein product of
the cloned genes, termed a recombinant protein. In practice, it is frequently
more difficult to develop an organism that produces an active form of the
recombinant protein in desirable quantities than it is to clone the gene. This
is because the molecular signals for gene expression are complex and variable,
and because protein folding, stability and transport can be very challenging.
Many useful proteins are currently
available as recombinant products. These include--(1)
medically-useful proteins whose administration can correct a defective or
poorly-expressed gene (e.g. recombinant factor VIII,
a blood-clotting factor deficient in some forms of hemophilia,[12]
and recombinant insulin,
used to treat some forms of diabetes), (2) proteins that can be administered to assist in
a life threatening emergency (e.g. tissue plasminogen activator, used to
treat strokes), and (3) recombinant subunit vaccines, in which a purified
protein can be used to immunize patients against infectious diseases, without
exposing them to the infectious agent itself (e.g. hepatitis B vaccine).
Transgenic organisms
Once characterized and manipulated to
provide signals for appropriate expression, cloned genes may be inserted into
organisms, generating transgenic organisms, also termed genetically-modified organisms
(GMOs). Although most GMOs are generated for purposes of basic biological
research (see for example, transgenic
mouse), a number of GMOs have been developed for commercial use,
ranging from animals and plants that produce pharmaceuticals or other compounds
(pharming), herbicide-resistant crop plants, and
fluorescent tropical fish (GloFish) for home entertainment.
Gene therapy
Gene therapy involves supplying a
functional gene to cells lacking that function, with the aim of correcting a
genetic disorder or acquired disease. Gene therapy can be broadly divided into
two categories. The first is alteration of germ cells, that is, sperm or eggs,
which results in a permanent genetic change for the whole organism and
subsequent generations. This “germ line gene therapy” is considered by many to
be unethical in human beings. The second type of gene therapy, “somatic cell
gene therapy”, is analogous to an organ transplant. In this case, one or more
specific tissues are targeted by direct treatment or by removal of the tissue,
addition of the therapeutic gene or genes in the laboratory, and return of the
treated cells to the patient. Clinical trials of somatic cell gene therapy
began in the late 1990s, mostly for the treatment of cancers and blood, liver,
and lung disorders.
Despite a great deal of publicity and
promises, the history of human gene therapy has been characterized by
relatively limited success. The effect of introducing a gene into cells often
promotes only partial and/or transient relief from the symptoms of the disease
being treated. Some gene therapy trial patients have suffered adverse
consequences of the treatment itself, including deaths. In some cases, the adverse
effects result from disruption of essential genes within the patient's genome
by insertional inactivation. In others, viral vectors used for gene therapy
have been contaminated with infectious virus. Nevertheless, gene therapy is
still held to be a promising future area of medicine, and is an area where
there is a significant level of research and
development activity.