Team:SUSTC-Shenzhen-B/algorithm
From 2012.igem.org
Line 48: | Line 48: | ||
<p>Here nT is T tail score. deltaG is the minimum free energy change of stemloop formation. LH is the length of stemloop. | <p>Here nT is T tail score. deltaG is the minimum free energy change of stemloop formation. LH is the length of stemloop. | ||
</p> | </p> | ||
- | < | + | <br> |
- | < | + | <br> |
<h3>Introduction to scoring system 3</h3> | <h3>Introduction to scoring system 3</h3> |
Revision as of 01:58, 8 September 2012
Introduction to Scoring Systems
Introduction to scoring system 2
Scoring System 2 is based on the model created by d'Aubenton Carafa.[1] The score of terminator consists of two parts, the free energy of stemloop and the score of 15 nt poly T tail. The free energy of stemloop is calculated using Loop Dependent Energy Rules[2]. The minimization of the free energy also determined the secondary structure of the stemloop. T tail score is calculated by the formula given by d’ Aubenton Carafa.
Detailed Calculation of Score
1. Some definitions[2]
i. Closing Base Pair
For an RNA sequence, we number it from 5’ to 3’ . If i < j and nucleotides ri and rj form a base pair,we denote it by i.j. We call base ri’ or base pair i’.j’ is accessible from i.j if i < i’ ( < j’ ) < j and if there is no other base pair k.l so that i < k < i’ ( < j’ ) < l < j. We denote the collection of base and base pair accessible from i.j by L(i,j). Then i.j is the closing base pair. Here “L” means loop.
ii. n-loop
If the loop contain n – 1 base pairs, we denote it by n-loop. (Because there is a closing base pair, so we denote it by n-loop even though the closing base pair is not included in the loop.)
Here we can divide loops which may be formed in the terminator secondary structure into two kinds.
1-loop : Hairpin loop(size of loop shouldn’t be smaller than 3)
2-loop : Interior Loop(right strand size and left strand size are both bigger than 0.)
Buldge(Size of one strand is bigger than 0 and that of another strand is 0.)
Stack(size of the loop is 0.)
2. Calculation of the Minimum Free Energy Change of Stemloop Formation[3]
Assume i.j is the closing base pair of the loop
G(i,j)= min { GH ( i , j ) , GS( i , j ) + G ( i + 1 , j – 1 ) , GBI( i , j ) } ;
GBI ( i , j ) = min{ gbi( i , j , k , l ) + G( k , l ) } for all 0 < k – i + l – j - 2 < max_size
G(i,j) is the minimum free energy change of stemloop formation. GH is the free energy change to form a hairpin loop. GS is the free energy change to form a stack. GBI is to calculate the minimum free energy change of structure containing 2-loop. gbi(i,j,k,l) is the free energy change to form 2-loop.
3.Calculation of T Tail Score
Here we consider 15 nucleotide in the downstream of stemloop. T tail score nT is calculated as follows :
In our program, if the length of the T tail( n ) is less than 15, we will only consider n nucleotides. If TL is more than 15, we will only consider 15 nucleotides.
4.Calculation of Score
Score = nT * 18.16 + deltaG / LH * 96.59 – 116.87
Here nT is T tail score. deltaG is the minimum free energy change of stemloop formation. LH is the length of stemloop.
Introduction to scoring system 3
Scoring System 3 is based on the model created by Elena A Lesnik. The score is totally energy based,which includes the free energy change to form stemloop and the energy change to form RNA-DNA duplex. Here the energy of the two parts are both based on Nearest Neighborhood Rules. The minimization of the free energy change of stemloop formation will also give the secondary structure of the stemloop. Compared to d’Aubenton Carafa’s model , the model here considered the free energy of the duplex within the Transcription Elongation Complex[4]. The poly T tail is divided into three parts, including proximal part, distal part and extra part. If there are spacer, it will also be taken into account. [4]
1.Calculation of the Minimum Free Energy Change of Stemloop Formation
The calculation is the same as Scoring System2.
2.Calculation of the Energy Change to Form RNA-DNA duplex
i. Nearest Neighborhood Rules
The energy change of base pair formation is related to its neighborhood. So when calculating energy , the neighborhood should be taken into account.
ii.RNA/DNA Duplex
According to the model created by E.A Lesnik, we divide T tail into three regions, proximal region(5 nt), distal region(4 nt) and extra region(3 nt). If the length of T tail ( n ) is less than 12, we will just consider n nucleotides.
We will calculate the free energy change of RNA:DNA hybrid formation of the spacer (if it exists; 0 to 2 nt ) and proximal T region. We also take distal T region and extra T region into account.[4]
iii.Calculation of the Energy Change to Form RNA/DNA duplex
Now there are thermodynamic parameters (based on nearest neighborhood energy rules) [5]for us to calculate the free energy change of RNA:DNA formation. For example, if the sequence is 5’ TTTTCATTGTTA 3’, then the energy change is :
rUUUUCAUUGUUA = initiation energy + rUU +rUU + rUU + rUC + rCA +rAU + rUU + rUG + rGU + rUU + rUA
3.Calculation of Score[4]
Score = deltaG( stemloop ) – [deltaG( spacer ) + deltaG(proximal part)] – 0.5 * deltaG( distal part ) – 0.01 * deltaG( extra part )
Here the 2 to 5 terms are calculated according to RNA/DNA duplex thermodynamic parameters.