Carafa Scoring System
Scoring System 2 is based on the model created by d'Aubenton Carafa.[1] The score of terminator consists of two parts, the free energy of stemloop and the score of 15 nt poly T tail. The free energy of stemloop is calculated using Loop Dependent Energy Rules[2]. The minimization of the free energy also determined the secondary structure of the stemloop. T tail score is calculated by the formula given by d’ Aubenton Carafa.
Detailed Calculation of Score
1. Some definitions[2]
i. Closing Base Pair
For an RNA sequence, we number it from 5’ to 3’ . If i < j and nucleotides ri and rj form a base pair,we denote it by i.j. We call base ri’ or base pair i’.j’ is accessible from i.j if i < i’ ( < j’ ) < j and if there is no other base pair k.l so that i < k < i’ ( < j’ ) < l < j. We denote the collection of base and base pair accessible from i.j by L(i,j). Then i.j is the closing base pair. Here “L” means loop.
ii. n-loop
If the loop contain n – 1 base pairs, we denote it by n-loop. (Because there is a closing base pair, so we denote it by n-loop even though the closing base pair is not included in the loop.)
Here we can divide loops which may be formed in the terminator secondary structure into two kinds.
1-loop : Hairpin loop(size of loop shouldn’t be smaller than 3)
2-loop : Interior Loop(right strand size and left strand size are both bigger than 0.)
Buldge(Size of one strand is bigger than 0 and that of another strand is 0.)
Stack(size of the loop is 0.)
2. Calculation of the Minimum Free Energy Change of Stemloop Formation[3]
Assume i.j is the closing base pair of the loop
G(i,j)= min { GH ( i , j ) , GS( i , j ) + G ( i + 1 , j – 1 ) , GBI( i , j ) } ;
GBI ( i , j ) = min{ gbi( i , j , k , l ) + G( k , l ) } for all 0 < k – i + l – j - 2 < max_size
G(i,j) is the minimum free energy change of stemloop formation. GH is the free energy change to form a hairpin loop. GS is the free energy change to form a stack. GBI is to calculate the minimum free energy change of structure containing 2-loop. gbi(i,j,k,l) is the free energy change to form 2-loop.
3.Calculation of T Tail Score
Here we consider 15 nucleotide in the downstream of stemloop. T tail score nT is calculated as follows :
In our program, if the length of the T tail( n ) is less than 15, we will only consider n nucleotides. If TL is more than 15, we will only consider 15 nucleotides.
4.Calculation of Score
Score = nT * 18.16 + deltaG / LH * 96.59 – 116.87
Here nT is T tail score. deltaG is the minimum free energy change of stemloop formation. LH is the length of stemloop.
[1] J. Mol. Biol. (1990) 216, 835-858 “Prediction of Rho-independent Escherichia coli Transcription Terminators A Statistical Analysis of their RNA Stem-Loop Structures”
[2] Manual of Mfold Version 3.5
[3] http://unafold.math.rpi.edu/lectures/old_RNAfold/node2.html
[4]Nucl. Acids Res.-2001-Lesnik-3583-94 “Prediction of Rho-independent Escherichia coli ”
[5] Biochemistry 1995,34, 1121 1-1 1216 “Thermodynamic Parameters To Predict Stability of RNA-DNA Hybrid Duplexes”