# Team:SUSTC-Shenzhen-B/algorithm

### From 2012.igem.org

## Algorithm 1

In 2007, Carleton L Kingsford et al. [1] described TransTermHP, a new computational method to rapidly and accurately detect Rho-independent transcription terminators.

They put forward an algorithm to predict Rho-independent terminators.

The first 15 bases of the potential tail sequence are scored using a function:

(1)

where

(2)

for n=1...15 and =1.

The energy of potential hairpin configurations adjacent to a reference position can be found efficiently with a dynamic programming algorithm. The table entry hairpin_score[i,j] gives the cost of the best hairpin structure for which the base of the 5' stem is at nucleotide position i and the base of the 3' stem is at position j. The entry hairpin_score[i,j] can be computed recursively as follows:

(3)

The function energy(i,j) gives the cost of pairing the nucleotide at i with that at j, and loop_pen(n) gives the cost of a hairpin loop of length n. The hairpin's loop is forced to have a length between 3 and 13 nt, inclusive, by setting loop_pen(n) to a large constant for any n outside that range. The constant 'gap' gives the cost of not pairing a base with some base on the opposite stem and thus introducing a gap on one side of the hairpin stem.

Table 1

Parameters used to evaluate hairpins

Pairing Energy

G-C -2.3

A-T -0.9

G-T 1.3

Mismatch 3.5

Gap 6.0

Loop_pen(n) 1•(n - 2)

Parameters used to evaluate the energy of a potential hairpin where n is the length of the hairpin loop.

Being the main algorithm, the algorithm introduced above was used by us to write a program named hairpinscore. Users may use the program to predict various potential structures and efficiencies of a Rho-indepent terminator by simply inputting the sequence of the terminator.

The program wrote in C# was divided into several parts.

When a sequence was input, firstly, all the potential T tail sequences will be found.

Secondly, for each potential T tail sequence, all the potential stem-loop sequences will be found.

Thirdly, for all the potential stem-loop sequences of all the potential T tail sequences, the hairpin scores will be calculated and sorted in ascending order.

Forthly, for the first several probable cut positions, the structures will be determined. The structure is be expressed in an array of "0" and "1", where "0" means the corresponding base isn't paired with any other bases and "1" means the corresponding base is paired with another base.

Fifthly, the d score was calculated by the following method:

d score = tail score * 18.16 - hairpin score * 96.59 / LH - 116.87, where LH stands for the length of the stem-loop sequence. The method of calculating d score was adjusted from the method put forward by Yves d'Aubenton Carafa et al. [2].

Lastly, by analyzing the experiment data, we've got the relationship between termination efficiency and d score. Thus, once the d score is determined, the efficiency will be calculated easily.

In the future, the program may have more useful functions, for example, giving alternative sequences with efficiencies closed to a given number.

Reference

1. Carleton L Kingsford, Kunmi Ayanbule and Steven L Salzberg: Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biology 2007, 8:R22.

2. d'Aubenton Carafa Y, Brody E, Thermes C: Prediction of rho-independent Escherichia coli transcription terminators. A statistical analysis of their RNA stem-loop structures. J Mol Biol 1990, 216:835-858.