Team:USTC-Software/algoritms and methods

From 2012.igem.org

global_header

igem

Database

Database

Since this year our software has the functions of rebuilding the biological system by providing proper regulatees and regulators to users, a huge database containing comprehensive information about regulation is of great need. In order to make the calculated results experimentally feasible, we use in vivo data generated from Regulon Data Bank, a website providing all genome information about E.coli. K-12. Thus, our database includes all the information of E.coli. K-12. This is the very first time in iGEM competition that a team uses such a huge database, especially the massive regulatory information, in their software.

The database consists of two parts: first part stores regulatory information of operons to operons and genes to genes, and the second part contains genome information about operons, genes, promoters, terminators, RBS, 5' UTR and 3' UTR.

To present the unique features of our database, we name it RegulonLib

1. Regulatory Information

These two matrices share some basic properties. The regulatory direction is from the regulators in the ith row to the regulatees in the jth column (i, j are row and column indices), and the element at (i,j) represents the regulatory relations. In our case, several kinds of regulatory relations are considered shown as follows:

1: positive regulation

-1: negative regulation

0: no regulation

2: both positive and negative regulation

-2: unknown regulation

As to the operon-operon matrix, it has dimension of 549*549 and gene-promoter matrix is 773*180. Recursive method is employed to find proper regulators and regulatees.

2. Genome information

Genome information of operons, genes, promoters, RBS, terminators, 5' UTR and 3' UTR are stored in over 10000 SBOL format files. Some details are listed below:

2652 Operons, along with names and components (gene, promoter, terminator) for each

4554 Genes, along with name, identifier, sequence, start position, end position, start codon and end codon for each.

3734 Promoters, along with name, identification ,sigma factor and sequence for each.

207 Terminators, along with identifier, sequence and PubMed ID for each.

179 RBS, along with identifier, start position, end position and sequence for each.

3731 5' UTR & 3' UTR, along with number of 5' UTR, number of 3' UTR, start position of 5' UTR, end position of 5' UTR, start position of 3' UTR, end position of 3'UTR, sequence of 5'UTR and sequence of 3' UTR.

Here is a screenshot of a SBOL file

Based on this unique and comprehensive database, we also build a clotho application with the same name Regulon Lib.