Team:Tianjin/Modeling/Expression

From 2012.igem.org

Revision as of 14:00, 21 September 2012 by Austinamens (Talk | contribs)

Contents

Background

Within our chassis bacteria, the protein expression system is special because there are two channels to express the protein: the canonical channel exists within any wild type bacteria and the orthogonal channel created artificially with the help of orthogonal ribosomes and orthogonal mRNA whose protein expression is relatively independent of the canonical ones. There are two kinds of ribosome and mRNA: host ribosome (also known as normal ribosome, canonical ribosome or n-ribosome) orthogonal-ribosome (o-ribosome), normal-mRNA (n-mRNA) with canonical RBS sequence and orthogonal mRNA (o-mRNA) with mutated RBS sequence. The canonical mRNA can be translated with the help of canonical ribosome. On the other hand, o-ribosomes can also translate genes with altered Shine-Dalgarno (SD) sequences not recognized by host ribosomes.

In the case of o-ribosomes, mutations are introduced into the ASD region such that they can base pair with complementary, noncanonical SD sequences. However, we are not familiar with the internal mechanism of how the two kinds of mRNA interact with the two kinds of ribosomes. In order to investigate the mechanism of how the four components interact with each other within our chassis bacteria, we simulate the whole process through modeling.


Modeling Objective

As shown in Figure 1, there are two kinds of ribosome and two mRNA with two orthogonal RBS sequence. There can be four kinds of interactions between the four components: n-16S – n-RBS, n-16S – o-RBS, o-16S – o-RBS, o-16S – n-RBS. However, there are many questions which are unclear to us and thus need answering: How can the orthogonal mRNA express their encoded protein? How can the two proteins expressed system interact with each other? Since there are too much known about the mechanism of the protein expressed with coexistence of two protein expression system. Thus, our goal of modeling is to describe the whole process with the help of mathematical tool and to predict the result of the wet lab. Specifically, there are two goals: the verification of orthogonality and the prediction of protein expression level after introducing the orthogonal system by two steps.

Figure 1. Basic idea of our model: The four interactions among the four components.

Model Description and Design

How is protein expressed with the existence of an orthogonal system?
In Figure 1, the solid lines indicate the strong combination of the n-16s – n-RBS and o-16s – o-RBS; and the dotted line stands for the combination of canonical sequence with orthogonal sequence. The translation processes are illustrated in Figure 2. Refer to Background for details.

The strength of the interaction between SD and ASD sequence is thought to influence translational efficiency as mutations in either the SD or ASD sequence that weaken the interaction reduce the amount of protein made. The mechanism of protein expression is primarily determined by the delta Gibbs free energy of the combination of SD sequence on ribosome and the ASD sequence on RBS of the mRNA. In most cases, translation initiation is the rate-limiting step. Its rate is determined by multiple molecular interactions, including the hybridization of the 16S rRNA to the RBS sequence, the binding of rRNA to the start codon, the distance between the 16S r RNA binding site and the start codon (called spacing) and the presence of RNA secondary structures that occlude either the 16S rRNA binding site or the standby site.

Where, r stands for the translation initiation rate of the protein, ∆G_tot means the total Gibbs free energy change of the SD and ASD sequence. ∆G_tot is more negative when attractive interactions between ribosome and mRNA are present, and ∆G_tot is more positive when mutually exclusive secondary structures are present. β is the apparent Boltzmann constant for the system, which converts thermodynamic free energies to temperature differences. The initiation rate is be proportional to the amount of protein expressed, as the equation (2) described

Where, the proportionality factor k_2 accounts for any ribosome-mRNA molecular interactions that are independent of mRNA sequence and any translation-independent parameters, such as the DNA copy number, the promoter's transcription rate, the mRNA stability and the protein dilution rate.

What does k1 means?
The proportional factor k_1 can also be expressed specifically as equation (3)

The parameter m means the number of mRNA produced; R_tot means the parameter that reflect the number of certain ribosome.


How to calculate ΔGtot?
As for how to calculate ΔGtot, we need to know how the SD and ASD sequence compliment with each other. In fact, the thermodynamic free energy change during 30S complex assembly is determined by five molecular interactions that participate in the initial and final states of the system. The Watson-Crick base pairs and G:U wobbles (red lines) are shown in figure 3.

Given a specific mRNA sequence called the sub-sequence surrounding a start codon, ΔGtot is predicted according to an energy model (equation (4)), where the reference state is a fully unfolded sub-sequence with G = 0.

In such ΔG_(mRNA:rRNA) is the energy released when the last nine nucleotides (nt) of the E. coli 16S rRNA (3′-AUUCCUCCA-5′) hybridizes and co-folds to the mRNA sub-sequence (ΔG_(mRNA:rRNA) < 0). Intramolecular folding within the mRNA is allowed. All possible hybridizations between the mRNA and 16S rRNA are considered to find the highest affinity 16S rRNA binding site. The binding site minimizes the sum of the hybridization free energy ΔG_(mRNA:rRNA) and the penalty for nonoptimal spacing, ΔG_(spacing ). Thus, the algorithm can identify the16S rRNA binding site regardless of its similarity to the consensus Shine-Dalgarno sequence.

ΔG_start is the energy released when the start codon hybridizes to the initiating tRNA anticodon loop (3′-UAC-5′).

ΔG_(standby ) is the work required to unfold any secondary structures sequestering the standby site (ΔG_(standby )< 0) after the 30S complex assembly. We define the standby site as the four nucleotides upstream of the 16S rRNA binding site, which is its location in a previously studied mRNA.

To calculate ΔG_(mRNA:rRNA) ,ΔG_start ,ΔG_(spacing ),ΔG_(standby ) and ΔG_mRNA, we use the NUPACK suite of algorithms with the Mfold 3.0 RNA energy parameters. These free energy calculations do not have any additional fitting or training parameters and explicitly depend on the mRNA sequence. In addition, the free energy terms are not orthogonal; changing a single nucleotide can potentially affect multiple energy terms. The relationship between the spacing and the ΔG_(spacing )was empirically determined by measuring the protein expression level driven by synthetic RBSs of varying spacing and fitting a quantitative model to this data.

Our Project

We firstly chose the red fluorescence protein (RFP) as the encoded protein of the n-mRNA, green fluorescence protein (GFP) of the o-mRNA. The final calculation results from the four pathways are listed in the tables below. Detailed calculation processes are present in the calculation section in this page.


</div>