Protein Expression Prediction


Within our chassis bacteria, the protein expression system is special because there are two channels to express the protein: the canonical channel exists within any wild type bacteria and the orthogonal channel created artificially with the help of orthogonal ribosomes and orthogonal mRNA whose protein expression is relatively independent of the canonical ones. There are two kinds of ribosome and mRNA: host ribosome (also known as normal ribosome, canonical ribosome or n-ribosome) orthogonal-ribosome (o-ribosome), normal-mRNA (n-mRNA) with canonical RBS sequence and orthogonal mRNA (o-mRNA) with mutated RBS sequence. The canonical mRNA can be translated with the help of canonical ribosome. On the other hand, o-ribosomes can also translate genes with altered Shine-Dalgarno (SD) sequences not recognized by host ribosomes.

In the case of o-ribosomes, mutations are introduced into the ASD region such that they can base pair with complementary, noncanonical SD sequences. However, we are not familiar with the internal mechanism of how the two kinds of mRNA interact with the two kinds of ribosomes. In order to investigate the mechanism of how the four components interact with each other within our chassis bacteria, we simulate the whole process through modeling.

Modeling Objective

As shown in Figure 1, there are two kinds of ribosome and two mRNA with two orthogonal RBS sequence. There can be four kinds of interactions between the four components: n-16S ~ n-RBS, n-16S ~ o-RBS, o-16S ~ o-RBS, o-16S ~ n-RBS. However, there are many questions which are unclear to us and thus need answering: How can the orthogonal mRNA express their encoded protein? How can the two proteins expressed system interact with each other? Since there are too much known about the mechanism of the protein expressed with coexistence of two protein expression system. Thus, our goal of modeling is to describe the whole process with the help of mathematical tool and to predict the result of the wet lab. Specifically, there are two goals: the verification of orthogonality and the prediction of protein expression level after introducing the orthogonal system by two steps.

Figure 1. Basic idea of our model: The four interactions among the four components.(from TJU iGEM Team 2012)

Model Description and Design

How is protein expressed with the existence of an orthogonal system?

In Figure 1, the solid lines indicate the strong combination of the n-16s – n-RBS and o-16s – o-RBS; and the dotted line stands for the combination of canonical sequence with orthogonal sequence. The translation processes are illustrated in Figure 2. Refer to Background for details.

Figure 2. Translation process(from reference“Automated design of synthetic ribosome binding sites to control protein expression”)

The strength of the interaction between SD and ASD sequence is thought to influence translational efficiency as mutations in either the SD or ASD sequence that weaken the interaction reduce the amount of protein made. The mechanism of protein expression is primarily determined by the delta Gibbs free energy of the combination of SD sequence on ribosome and the ASD sequence on RBS of the mRNA. In most cases, translation initiation is the rate-limiting step. Its rate is determined by multiple molecular interactions, including the hybridization of the 16S rRNA to the RBS sequence, the binding of rRNA to the start codon, the distance between the 16S r RNA binding site and the start codon (called spacing) and the presence of RNA secondary structures that occlude either the 16S rRNA binding site or the standby site.


Where, r stands for the translation initiation rate of the protein, ∆Gtot means the total Gibbs free energy change of the SD and ASD sequence. ∆Gtot is more negative when attractive interactions between ribosome and mRNA are present, and ∆Gtot is more positive when mutually exclusive secondary structures are present. β is the apparent Boltzmann constant for the system, which converts thermodynamic free energies to temperature differences.

The initiation rate is be proportional to the amount of protein expressed, as the equation (2) described


Where, E is the amount of protein expressed. The proportionality factor k2 accounts for any ribosome-mRNA molecular interactions that are independent of mRNA sequence and any translation-independent parameters. k stands for all the factors exclusive of mRNA and ribosome amount.

How to calculate ΔGtot?

As for how to calculate ΔGtot, we need to know how the SD and ASD sequence compliment with each other. The Watson-Crick base pairs and G:U wobbles (red lines) are shown in Figure 3.

Figure 3. Initial and final state of translation initiation process (from TJU iGEM Team 2012)

Given a specific mRNA sequence called the sub-sequence surrounding a start codon, ΔGtot is predicted according to an energy model (equation (3)), where the reference state is a fully unfolded sub-sequence with G=0.


In such ΔGmRNA:rRNA is the energy released when the last nine nucleotides (nt) of the E. coli 16S rRNA (3′-AUUCCUCCA-5′) hybridizes and co-folds to the mRNA sub-sequence (ΔGmRNA:rRNA< 0). Intramolecular folding within the mRNA is allowed. All possible hybridizations between the mRNA and 16S rRNA are considered to find the highest affinity 16S rRNA binding site. The binding site minimizes the sum of the hybridization free energy ΔGmRNA:rRNA and the penalty for nonoptimal spacing, ΔGspacing. Thus, the algorithm can identify the16S rRNA binding site regardless of its similarity to the consensus Shine-Dalgarno sequence.

ΔGstart is the energy released when the start codon hybridizes to the initiating tRNA anticodon loop (3′-UAC-5′).

ΔGstandby is the work required to unfold any secondary structures sequestering the standby site (ΔGstandby< 0) after the 30S complex assembly. We define the standby site as the four nucleotides upstream of the 16S rRNA binding site, which is its location in a previously studied mRNA.

To calculate ΔGmRNA:rRNA, ΔGstart ,ΔGspacing, ΔGstandby and ΔGmRNA, we use the NUPACK suite of algorithms with the Mfold 3.0 RNA energy parameters. These free energy calculations do not have any additional fitting or training parameters and explicitly depend on the mRNA sequence.

In addition, the free energy terms are not orthogonal; changing a single nucleotide can potentially affect multiple energy terms. The relationship between the spacing and the ΔGspacing was empirically determined by measuring the protein expression level driven by synthetic RBSs of varying spacing and fitting a quantitative model to this data.

Verification of the Orthogonality

Since the two protein expression systems are orthogonal, the orthogonality should be one of the most important factors in our modeling. So the first important task of the dry lab is to predict the orthogonality.

If the two protein expression systems are orthogonal, the protein expressed through pathway 3 and 4 should be minute when compared with pathway 1 and 2. Therefore, we firstly calculate the individual protein expression level.

The individual ΔGtot is calculated according to the formula in previous section. Refer to the Calculation page for details.

Table 1. RFP calculation result of ΔG (from TJU iGEM Team 2012)

Pathway 1 Pathway 2 Pathway 3 Pathway 4
  ΔGtot(kcal/mol)     ΔGtot(kcal/mol)     ΔGtot(kcal/mol)     ΔGtot(kcal/mol)  
          -4.7           -6.28           3.73           4.93

According to the equation of protein expression level:


And under the hypothesis that the k, m, and Rtot remain the same in the four pathways because they happen in the same cell, we can arrive at the following results:


The results are shown in Figure 4. According to the results, we know that Pathway

Figure 4. Protein expression level(from TJU iGEM Team 2012)

E1 and E2 express considerable amount of protein, which indicates that the orthogonal system works just fine as works the canonical system. Furthermore, E3 and E4 is negligible compared to other two results. This show that the orthogonal ribosome cannot translate the canonical mRNA, and vice versa. Therefore, our system demonstrated great orthogonality, which goes very well with our experiment.

Construction of Multi-Orthogonal Systems

After the design and construction of two orthogonal systems, the wet lab result verified the accuracy of the model. Then, we also created triple-orthogonal systems within bacteria. The prediction result is shown in Figure 5.

Figure 5. Prediciton result of triple-orthogonal system (from TJU iGEM Team 2012)

We can see from the figure that the protein expresion of diaganol line expression amount is significantly larger than that of others.

In the future we can also have the bold goals of the construction of multi-orhtoganol system theoritically feasible.