Team:BYUProvo/Modeling

From 2012.igem.org

(Difference between revisions)
(Parameter Estimation)
(Parameter Estimation)
Line 76: Line 76:
== Parameter Estimation ==
== Parameter Estimation ==
-
Most of the parameters were not readily available in databases, but we were able to figure out reasonable values for each one. The rate of nucleotide addition in transcription is about 60 per second and the rate of amino acid addition in translation is about 20 per second. Our mRNA is approximately 3200 nucleotides long, therefore an appropriate value for k3 and k5 was found by dividing 60 by 3200. Thus k3 = 0.0187/sec. Noting that in ''E. Coli'' the lifespan of mRNA is about 60 seconds and the lifetime of protein is approximately 1 day, we find that y = .0167/sec and p = 0/sec. Because p = 0, this automatically reduces our system of differential equations, because we find that the concentration of protein is now only dependent on the concentration of mRNA, in a linear fashion. We can therefore say that the protein follows the mRNA: whatever the mRNA (M) does in the model, the protein (K) will as well. Our matrix A then becomes:
+
Most of the parameters were not readily available in databases, but we were able to figure out reasonable values for each one. The rate of nucleotide addition in transcription is about 60 per second and the rate of amino acid addition in translation is about 20 per second. Our mRNA is approximately 3200 nucleotides long, therefore an appropriate value for k3 and k5 was found by dividing 60 by 3200. Thus k3 = 0.0187/sec. Noting that in ''E. Coli'' the lifespan of mRNA is about 60 seconds and the lifetime of protein is approximately 1 day, we find that y = .0167/sec and p = 0/sec. Because p = 0, this automatically reduces our system of differential equations, because we find that the concentration of protein is now only dependent on the concentration of mRNA, in a linear fashion. We can therefore say that the protein follows the mRNA: whatever the mRNA (M) does in the model, the protein (K) will do as well. Our matrix A then becomes:
[[File:A.matrixrevised.png]]
[[File:A.matrixrevised.png]]

Revision as of 08:16, 3 October 2012

Team BYU Provo

Contents

Introduction

Colon cancer polyps produce high amounts of reactive oxygen species (ROS) and lactate. The high metabolic activity also causes an increase in temperature. Sensors for any one of these inputs alone would be confounded by normal physiological variation in temperature, lactate concentration, and ROS concentration. We propose a genetic circuit designed to detect higher than normal levels of all three, producing two separate outputs. There are two parts to the circuit: The first is a dual input system, using temperature and ROS as inputs to produce an output (LacZ). The second is a single input system, using lactate to produce GFP.

Insert picture of model here

In order to model our system, we have undertaken three main tasks:

1) Create a model using Mass-Action Enzyme Kinematics

2) Analyze this model using computational methods

3) Create an algorithm to predict the structure of our RNA thermosensors

We will start by describing the reactions within our circuit and then by creating a system of differential equations from the reaction sequence.

Our Circuit

Final Draft Poster Circuit Design.png

The diagram above depicts the inner workings of our circuit created within E. Coli. The following chemical equations depict the pathway:

Enzyme Kinetics Chemical Eqtns.PNG

The Model

Mass-Action Equations

Using mass-action kinetics, we wrote these chemical equations as a system of differential equations.

DFQ1.png

System of ODEs

As it is, the system is too complicated for us to analyze, so we hereby make a few assumptions to simplify.

  • Eliminating Constant Variables

In our model, SoxR (S), SoxS (X), Polymerase (P), and Ribosome (B) never change concentration and we assume that they are in excess compared to the other species. Therefore, we eliminate them from the model, simply combining them with the other constants.

DFQ2.png

  • Reverse Reaction Assumption

The last two chemical equations represent transcription and translation, respectively, and these two processes are assumed to have a forward reaction only. Therefore, we set all k values for the reverse reactions equal to zero.

Par1.png

Thus we are left with the set of equations shown below.

DFQ3.png Enzyme Kinetics Chemical Eqtns With Assumptions.PNG

  • Quasi Steady State Assumption

We assume that the concentration of the intermediates of transcription and translation do not change on the time-scale of mRNA and protein formation, therefore, we set

Par2.png

We are then left with the following set of equations:

DFQ4.png

We define the matrix A to be:

A.matrix.png

And the vector v to be:

PAR3.png

Therefore, our system is simply: v' = A*v

Parameter Estimation

Most of the parameters were not readily available in databases, but we were able to figure out reasonable values for each one. The rate of nucleotide addition in transcription is about 60 per second and the rate of amino acid addition in translation is about 20 per second. Our mRNA is approximately 3200 nucleotides long, therefore an appropriate value for k3 and k5 was found by dividing 60 by 3200. Thus k3 = 0.0187/sec. Noting that in E. Coli the lifespan of mRNA is about 60 seconds and the lifetime of protein is approximately 1 day, we find that y = .0167/sec and p = 0/sec. Because p = 0, this automatically reduces our system of differential equations, because we find that the concentration of protein is now only dependent on the concentration of mRNA, in a linear fashion. We can therefore say that the protein follows the mRNA: whatever the mRNA (M) does in the model, the protein (K) will do as well. Our matrix A then becomes:

A.matrixrevised.png

From the work of [http://jaguar.biologie.hu-berlin.de/~wolfram/data/ibsb2007_Borger_et_al.pdf Borger et. al 2007] we find that kinetic rate constants for reactions involving only reactants can be described by a Gaussian distribution with a median of 0.14. Kinetic rate constants for reactions involving reactants and enzymes have a Gaussian distribution with a median of 6.0. The first reaction is viewed as an interaction between reactants and the second as an enzyme reaction. Thus, to get values for k1 and k-1 and for k2 and k-2 we created a Gaussian distribution around the appropriate median and chose randomly from this distribution.

We chose the initial concentration of ROS to be .001 M (even smaller values may be expected in the body) and the initial concentration of the other 3 species to be 0 M.

Solutions

Using the ode23 solver in Matlab we are able to see the solutions to our system of differential equations.

Experimental Data from Lab Work

Temperature Dependence

Analysis

Bifurcation Analysis

Steady State Analysis

Modeling our Thermosensor Library

Herein we provide detailed information about our library of thermosensors and describe our attempt to model the secondary structure of the RNA hairpins. We provide a description of an algorithm we developed, similar to the Smith-Waterman algorithm.

Smith-Waterman Algorithm

The Smith-Waterman Algorithm is a simple process used to perform sequence alignment. To demonstrate how the algorithm works, we will use these two sequences:

  • ACUAGGUA
  • AGCUAGGA
SM-1.png

First one sequence is placed in the first row of a grid, skipping the first two entries in the row. The second is likewise placed in the first column, skipping the first two entries in the column. Zeros are then placed in row 2 and column 2.

Then, a scoring matrix, S is created according to the following rules:

SM1.png

SM-2.png

For example S(4,8) in the blue was obtained by adding 2 to the number in the entry in the upper left-hand corner. S(7,7) was obtained by adding -1 to the max of the 3 numbers above, on the upper left-hand corner, and to the left of it.

SM-3.png

Once the scoring matrix has been completed, starting in the bottom right corner, a path is chosen, picking the largest numbers (only numbers to the left, above or up and to the left can be chosen), until the path arrives back at a zero. When a number above or to the left is the same as the number on the diagonal, the number above or to the left is to be chosen first.


SM-4.png

The resulting path spells out the proper alignment of the two sequences. Squares alone in their row and column represent an alignment and when two or more squares share the same column or row, the one closest to the bottom right corner is the one that represents the alignment. The other squares represent deletions or insertions. For our example, the final alignment is shown in the blue.

Thus the alignment of the two sequences would be:

  • A--CUAGGUA
  • AGCUAGG--A

Source: [http://docencia.ac.upc.edu/master/AMPP/slides/ampp_sw_presentation.pdf Ayguade et. al 2007]

Our Revised Algorithm

Using the same ideas applied in the Smith-Waterman method, we created an algorithm to model the secondary structure of our RNA thermosensors. The main difference is that we are aligning one side of an RNA sequence with its other side and we are not working with two separate sequences. Thus, we create a similar scoring matrix by placing the RNA sequence along the top row (skipping the first 2 entries) and by placing the reverse of the RNA sequence down the first column (also skipping the first 2 entries).

We then assign priorities in this way:

OAR1.png

The if statements refer to the alignment of different base pairs. Here, w1, w2, w3 and w4 are weights that we assigned unique to our library of thermosensors.

  • w1 = 1
  • w2 = 2
  • w3 = 3
  • w4 = -2

To provide an example, we will use TSA, the wild-type thermosensor.

The sequence is:

uuuagcgugacuuucuuucaacagcuaacaauuguuguuacugccuaauguuuuuaggguauuuuaaaaaagggcgauaaaaaacgauuggaggaugagacaugaacgcucaa

After placing the forward sequence along the top row and the reverse sequence down the first column, we start at the very bottom right corner and proceed to find our way back, following the path that gives us the largest values. The rules for our algorithm are as follows:

Finding the Best Path Back
  • 1) To go from one entry to another, the largest of the three values in the entries to the left, above and in the left-hand corner is picked.
  • 2) If the largest value is present twice, once in the corner and once to the left or above, always choose to move to the left or above before moving to the corner entry and then only move to the corner entry if it is the largest number for the next pick.
  • 3) If the largest value is present in both the left entry and the one above (3 times if present in the corner entry as well), choose the one which will give the largest number in the next pick.
  • 4) Continue on this path until the entry S(i,j) has been reached where i+j = n | n = total number of bases in the RNA sequence being analyzed. This represents the point where the thermosensor turns around. To continue would be, essentially, going back the way you came.
  • 5) Then, pick every entry which is connected to another entry only by a corner. Discard all entries which sit in an irregular position. There are 2 types of irregular positions. First, if an entry in the lower right hand corner is less than the entry in the upper left hand corner, discard the entry in the lower right hand corner. Second, if an entry stands alone, not connected to any other entries, discard that entry. See figure above for an example.
  • 6) Last, because of the hairpin loop, we eliminate the last two bonds, where the loop occurs, if the algorithm did predict bonding in that region.

Here is the final path:

final path

Here is a breakdown of the color scheme:

  • Yellow is the path back determined by our algorithm
  • The dark boxes are the entries chosen by our algorithm-these determine the bonding patterns
  • Red are the actual bonding patterns observed in the secondary structure of TSA
  • Orange is where the actual bonding patterns fall on the same path predicted by our algorithm
  • Ovals indicate regions where our algorithm differs from the rnafold program in Matlab

And for a look at the entire scoring matrix:

OAR.4.png

Once this path is determined, it is then converted into a series of dots and parenthesis which represent the bonding in the RNA structure. The bonding pattern determined by our algorithm is:

...((((........(((((....((..((((((...(((.(((((....((((((((....)))))))).))))).)))....))))))..)).)))))......))))...

The bonding pattern predicted by rnafold in Matlab is:

...(((((.......(((((....((..((((((((.((..(((((....((((((((....)))))))).)))))..))..))))))))..)).))))).....)))))...

The secondary structure, according to our algorithm is shown below, followed by the secondary structure determined by Matlab.

WT-OAR.png

TSA.png

We did not perform this algorithm by hand, but wrote a program to do it for us in Matlab.