Team:CBNU-Korea/Project/GD/Analysis

From 2012.igem.org

(Difference between revisions)
Line 51: Line 51:
border: 1px dotted #DFDFDF;
border: 1px dotted #DFDFDF;
background-size: 100% 100%;
background-size: 100% 100%;
 +
}
 +
 +
#CB_sub_chart_3 {
 +
width: 55%;
 +
height: 300px;
 +
float: right;
 +
border: 1px dotted #DFDFDF;
 +
background-size: 100% 100%;
 +
margin: 0 auto;
}
}
</style>
</style>
Line 223: Line 232:
'https://static.igem.org/mediawiki/igem.org/1/1c/CBK_CP_000837_after.png',
'https://static.igem.org/mediawiki/igem.org/1/1c/CBK_CP_000837_after.png',
'https://static.igem.org/mediawiki/igem.org/8/85/CBK_AP_012053_after.png' ];
'https://static.igem.org/mediawiki/igem.org/8/85/CBK_AP_012053_after.png' ];
 +
 +
var pic_array = ['https://static.igem.org/mediawiki/igem.org/8/8a/CBK_NC_017040.png',
 +
                'https://static.igem.org/mediawiki/igem.org/9/9a/CBK_NC_006449.png',
 +
                'https://static.igem.org/mediawiki/igem.org/f/f2/CBK_NC_006448.png',
 +
                'https://static.igem.org/mediawiki/igem.org/e/e5/CBK_NC_011375.png',
 +
                'https://static.igem.org/mediawiki/igem.org/c/c7/CBK_NC_008021.png',
 +
                'https://static.igem.org/mediawiki/igem.org/f/f9/CBK_NC_007297.png',
 +
                'https://static.igem.org/mediawiki/igem.org/b/b0/CBK_NC_009332.png',
 +
                'https://static.igem.org/mediawiki/igem.org/b/b6/CBK_NC_002737.png',
 +
                'https://static.igem.org/mediawiki/igem.org/c/c4/CBK_NC_008023.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/de/CBK_NC_004606.png',
 +
                'https://static.igem.org/mediawiki/igem.org/c/ce/CBK_NC_003485.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/73/CBK_NC_007296.png',
 +
                'https://static.igem.org/mediawiki/igem.org/f/f0/CBK_NC_006086.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/6c/CBK_NC_004070.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/76/CBK_NC_008022.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/0a/CBK_NC_008024.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/a9/CBK_NC_016826.png',
 +
                'https://static.igem.org/mediawiki/igem.org/1/14/CBK_NC_012925.png',
 +
                'https://static.igem.org/mediawiki/igem.org/2/20/CBK_NC_013928.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/4e/CBK_NC_011134.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/0b/CBK_NC_004350.png',
 +
                'https://static.igem.org/mediawiki/igem.org/3/39/CBK_NC_003098.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/7d/CBK_NC_008533.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/ab/CBK_NC_011072.png',
 +
                'https://static.igem.org/mediawiki/igem.org/5/5c/CBK_NC_014251.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/d3/CBK_NC_012924.png',
 +
                'https://static.igem.org/mediawiki/igem.org/e/e0/CBK_NC_015600.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/46/CBK_NC_012891.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/ac/CBK_NC_012467.png',
 +
                'https://static.igem.org/mediawiki/igem.org/c/c1/CBK_NC_012469.png',
 +
                'https://static.igem.org/mediawiki/igem.org/9/9b/CBK_NC_012466.png',
 +
                'https://static.igem.org/mediawiki/igem.org/2/2a/CBK_NC_007432.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/7a/CBK_NC_016749.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/48/CBK_NC_014494.png',
 +
                'https://static.igem.org/mediawiki/igem.org/b/bf/CBK_NC_015558.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/aa/CBK_NC_013853.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/63/CBK_NC_004116.png',
 +
                'https://static.igem.org/mediawiki/igem.org/2/29/CBK_NC_003028.png',
 +
                'https://static.igem.org/mediawiki/igem.org/5/58/CBK_NC_012468.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/d8/CBK_NC_009785.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/6f/CBK_NC_010582.png',
 +
                'https://static.igem.org/mediawiki/igem.org/9/9b/CBK_NC_004368.png',
 +
                'https://static.igem.org/mediawiki/igem.org/8/83/CBK_NC_015760.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/00/CBK_NC_011900.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/08/CBK_NC_014498.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/74/CBK_NC_010380.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/40/CBK_NC_012471.png',
 +
                'https://static.igem.org/mediawiki/igem.org/2/27/CBK_NC_013798.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/a2/CBK_NC_009009.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/40/CBK_NC_015875.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/67/CBK_NC_015433.png',
 +
                'https://static.igem.org/mediawiki/igem.org/1/18/CBK_NC_015291.png',
 +
                'https://static.igem.org/mediawiki/igem.org/b/b4/CBK_NC_015215.png',
 +
                'https://static.igem.org/mediawiki/igem.org/8/81/CBK_NC_012926.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/04/CBK_NC_012470.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/0f/CBK_NC_012004.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/4c/CBK_NC_009443.png',
 +
                'https://static.igem.org/mediawiki/igem.org/b/b5/CBK_NC_009442.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/d4/CBK_NC_008532.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/49/CBK_FR_873482.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/04/CBK_FQ_312045.png',
 +
                'https://static.igem.org/mediawiki/igem.org/a/ac/CBK_FQ_312043.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/60/CBK_FQ_312042.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/d2/CBK_FQ_312029.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/09/CBK_FQ_312030.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/d2/CBK_FQ_312029.png',
 +
                'https://static.igem.org/mediawiki/igem.org/0/05/CBK_FQ_312027.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/de/CBK_CP_003357.png',
 +
                'https://static.igem.org/mediawiki/igem.org/5/59/CBK_CP_003121.png',
 +
                'https://static.igem.org/mediawiki/igem.org/5/5a/CBK_CP_003068.png',
 +
                'https://static.igem.org/mediawiki/igem.org/e/e6/CBK_CP_002904.png',
 +
                'https://static.igem.org/mediawiki/igem.org/6/68/CBK_CP_002888.png',
 +
                'https://static.igem.org/mediawiki/igem.org/9/92/CBK_CP_002651.png',
 +
                'https://static.igem.org/mediawiki/igem.org/d/dd/CBK_CP_002644.png',
 +
                'https://static.igem.org/mediawiki/igem.org/1/1d/CBK_CP_002641.png',
 +
                'https://static.igem.org/mediawiki/igem.org/8/84/CBK_CP_002640.png',
 +
                'https://static.igem.org/mediawiki/igem.org/7/72/CBK_CP_002570.png',
 +
                'https://static.igem.org/mediawiki/igem.org/3/31/CBK_CP_002465.png',
 +
                'https://static.igem.org/mediawiki/igem.org/3/35/CBK_CP_002340.png',
 +
                'https://static.igem.org/mediawiki/igem.org/4/45/CBK_CP_002215.png',
 +
                'https://static.igem.org/mediawiki/igem.org/9/94/CBK_CP_000837.png',
 +
                'https://static.igem.org/mediawiki/igem.org/1/19/CBK_AP_012053.png'];
$(function() {
$(function() {
Line 234: Line 326:
$('#CB_sub_chart_2').css("background-image",
$('#CB_sub_chart_2').css("background-image",
"url(" + imgSrc2 + ")");
"url(" + imgSrc2 + ")");
 +
}).change();
 +
 +
$('#CB_select2').change(
 +
function(event) {
 +
var i = document.getElementById("CB_select2").selectedIndex;
 +
var imgSrc = pic_array[i];
 +
$('#CB_sub_chart_3').css("background-image",
 +
"url(" + imgSrc + ")");
}).change();
}).change();
});
});
Line 505: Line 605:
genes were distributed in 4 places with different tendency. So we
genes were distributed in 4 places with different tendency. So we
decided the section of the proc transreg as 4, and analyzed.</p>
decided the section of the proc transreg as 4, and analyzed.</p>
-
<img src="">
+
 +
 +
 +
<div id="select_img_box">
 +
<select id="CB_select2">
 +
<option>Streptococcus pyogenes MGAS15252</option>
 +
<option>Streptococcus thermophilus CNRZ1066</option>
 +
<option>Streptococcus thermophilus LMG 18311</option>
 +
<option>Streptococcus pyogenes NZ131</option>
 +
<option>Streptococcus pyogenes MGAS9429</option>
 +
<option>Streptococcus pyogenes MGAS5005</option>
 +
<option>Streptococcus pyogenes str. Manfredo</option>
 +
<option>Streptococcus pyogenes M1 GAS</option>
 +
<option>Streptococcus pyogenes MGAS2096</option>
 +
<option>Streptococcus pyogenes SSI-1</option>
 +
<option>Streptococcus pyogenes MGAS8232</option>
 +
<option>Streptococcus pyogenes MGAS6180</option>
 +
<option>Streptococcus pyogenes MGAS10394</option>
 +
<option>Streptococcus pyogenes MGAS315</option>
 +
<option>Streptococcus pyogenes MGAS10270</option>
 +
<option>Streptococcus pyogenes MGAS10750</option>
 +
<option>Streptococcus infantarius subsp. infantarius
 +
CJ18</option>
 +
<option>Streptococcus suis P1/7</option>
 +
<option>Streptococcus mutans NN2025</option>
 +
<option>Streptococcus equi subsp. zooepidemicus
 +
MGCS10565</option>
 +
<option>Streptococcus mutans UA159</option>
 +
<option>Streptococcus pneumoniae R6</option>
 +
<option>Streptococcus pneumoniae D39</option>
 +
<option>Streptococcus pneumoniae G54</option>
 +
<option>Streptococcus pneumoniae TCH8431/19A</option>
 +
<option>Streptococcus suis SC84</option>
 +
<option>Streptococcus pasteurianus ATCC 43144</option>
 +
<option>Streptococcus dysgalactiae subsp. equisimilis
 +
GGS_124</option>
 +
<option>Streptococcus pneumoniae P1031</option>
 +
<option>Streptococcus pneumoniae Taiwan19F-14</option>
 +
<option>Streptococcus pneumoniae JJA</option>
 +
<option>Streptococcus agalactiae A909</option>
 +
<option>Streptococcus macedonicus ACA-DC 198</option>
 +
<option>Streptococcus pneumoniae AP200</option>
 +
<option>Streptococcus parauberis KCTC 11537</option>
 +
<option>Streptococcus mitis B6</option>
 +
<option>Streptococcus agalactiae 2603V/R</option>
 +
<option>Streptococcus pneumoniae TIGR4</option>
 +
<option>Streptococcus pneumoniae 70585</option>
 +
<option>Streptococcus gordonii str. Challis substr. CH1</option>
 +
<option>Streptococcus pneumoniae CGSP14</option>
 +
<option>Streptococcus agalactiae NEM316</option>
 +
<option>Streptococcus salivarius CCHSS3</option>
 +
<option>Streptococcus pneumoniae ATCC 700669</option>
 +
<option>Streptococcus pneumoniae 670-6B</option>
 +
<option>Streptococcus pneumoniae Hungary19A-6</option>
 +
<option>Streptococcus equi subsp. equi 4047</option>
 +
<option>Streptococcus gallolyticus UCN34</option>
 +
<option>Streptococcus sanguinis SK36</option>
 +
<option>Streptococcus pseudopneumoniae IS7493</option>
 +
<option>Streptococcus suis ST3</option>
 +
<option>Streptococcus oralis Uo5</option>
 +
<option>Streptococcus gallolyticus subsp. gallolyticus
 +
ATCC BAA-2069</option>
 +
<option>Streptococcus suis BM407</option>
 +
<option>Streptococcus equi subsp. zooepidemicus</option>
 +
<option>Streptococcus uberis 0140J</option>
 +
<option>Streptococcus suis 98HAH33</option>
 +
<option>Streptococcus suis 05ZYH33</option>
 +
<option>Streptococcus thermophilus LMD-9</option>
 +
<option>ccus salivarius JIM8777</option>
 +
<option>s pneumoniae SPN034156 draft genome</option>
 +
<option>Streptococcus pneumoniae SPN034183 draft genome</option>
 +
<option>Streptococcus pneumoniae SPN033038 draft genome</option>
 +
<option>Streptococcus pneumoniae SPN032672 draft genome</option>
 +
<option>Streptococcus pneumoniae INV104 genome</option>
 +
<option>Streptococcus pneumoniae INV200 genome</option>
 +
<option>Streptococcus pneumoniae OXC141 genome</option>
 +
<option>Streptococcus pneumoniae ST556</option>
 +
<option>Streptococcus pyogenes MGAS1882</option>
 +
<option>Streptococcus pyogenes Alab49</option>
 +
<option>Streptococcus equi subsp. zooepidemicus ATCC
 +
35246</option>
 +
<option>Streptococcus salivarius 57.I</option>
 +
<option>Streptococcus suis ST1</option>
 +
<option>Streptococcus suis D12</option>
 +
<option>Streptococcus suis D9</option>
 +
<option>Streptococcus suis SS12</option>
 +
<option>Streptococcus suis A7</option>
 +
<option>Streptococcus suis JS14</option>
 +
<option>Streptococcus thermophilus ND03</option>
 +
<option>Streptococcus dysgalactiae subsp. equisimilis
 +
ATCC 12394</option>
 +
<option>Streptococcus suis GZ1</option>
 +
<option>Streptococcus gallolyticus subsp. gallolyticus
 +
ATCC 43143 DNA</option>
 +
</select>
 +
 
 +
<div id="CB_sub_chart_3"></div>
 +
</div>
 +
 +
<p>
<p>
Line 516: Line 715:
significance level of 0.01, the null hypothesis is dismissible. In
significance level of 0.01, the null hypothesis is dismissible. In
other words, the regression model is more suitable.</p>
other words, the regression model is more suitable.</p>
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/5/5f/CBK_C_003.png">
<p>
<p>
Line 522: Line 721:
</p>
</p>
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/0/01/CBK_C_004.png">
<p>As the notable probability gets smaller, it can affect the
<p>As the notable probability gets smaller, it can affect the
Line 549: Line 748:
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/d/d2/CBK_C_005.png">
<p>The X axis shows the 100 sections of the genome of the
<p>The X axis shows the 100 sections of the genome of the
Line 563: Line 762:
</p>
</p>
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/d/df/CBK_C_006.png">
<p>By conducting the chi-square test with the estimated
<p>By conducting the chi-square test with the estimated
transpose linear regression prediction equation with the 77
transpose linear regression prediction equation with the 77
Line 584: Line 783:
</p>
</p>
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/3/33/CBK_C_007.png">
 +
<img src="https://static.igem.org/mediawiki/igem.org/b/ba/CBK_C_008.png">
<p>As a result to guess the distribution of the essential gene,
<p>As a result to guess the distribution of the essential gene,
Line 598: Line 798:
</p>
</p>
-
<img src="">
+
<img src="https://static.igem.org/mediawiki/igem.org/1/15/CBK_C_009.png">
-
 
+
<img src="https://static.igem.org/mediawiki/igem.org/5/57/CBK_C_010.png">
 +
<img src="https://static.igem.org/mediawiki/igem.org/7/74/CBK_C_011.png">
 +
<img src="https://static.igem.org/mediawiki/igem.org/b/b2/CBK_C_012.png">
</div>
</div>

Revision as of 00:36, 27 September 2012

Minimal Genome Designer

- Analysis

1. Introduction

1-1. Suggestion

Since the Genome project started in 2002, we can easily get the genetic information of many species. Also as the scientific technique developed, we can insert and compose the genome. If we can design a whole genome, then we will be able to make a one and only useful genome. But as today, the compose of the minimum genome made with the essential gene has succeeded, but did not last.

1-2. Object

To design a genome, we have to analyze the pattern of the genome and the distribution of the gene.

1-3. Method and the range of the study

The study was used information of species in streptococcus by patric database ( http://www.patricbrc.org/portal/portal/patric/Home) and SynbUID.
The Data was built by mysql 5.5.27, and a statistical analysis program was used by SAS 9.3.

2. Design

2-1. Prepare

1) Build database

An attribute of Genome name is consisted of ID, Genome_name, COG, Start, End, Strand, and Size.
An attribute of Annotation Table_EG is consisted of ID, locus, and SynbUID. Two entities are paired of Locus_tag 1 by 1.

2) Represented sample number

For checking the number of specimen that is representative, we used a simple random sampling method, and assumed that the complete genome is random. We used the significance level (a=0.05) and the limit of error (b=0.1). The total species of streptococcus is 494 species, and between these, 82 species are completed. According to our calculation, when there is 81 species, the result is satisfied. Therefore, as a result, 82 complete species represent the streptococcus.

3) Standard

3-1) Divided the interval of the genome

The number and size of the genome differs between species. To supplement this problem, we divided the genes in a section to show the genome’s size as a proportion. As a result, when we divided the analyzing section less then a hundred, it was hard to see the patterns because the data has been diluted. And when we divided it into more then a hundred pieces, it was not that different from the result that divided it into a hundred pieces. So we decided to divide it into a hundred pieces.

3-2) Identified the starting point

The number one ORF of each gene sequence analysis data is different between every species. Thus we had to make a specific standard to equalize the beginning of the data. We checked the strand pattern of each genome and identified it with the strands.

2-2. Analysis

1) Strand

1-1) Method

- We chose 77 species out of 82 species randomly, and estimated the patterns of the strand ratio of each sections, and verified the estimated number with the other 5 species.

- We checked the strand ratio of the essential gene.

1-2) Region

- We checked where the genome is distributed.

3. Result

2-1. Strand

When we checked the strand pattern of the 82 species, the genes were distributed in 4 places with different tendency. So we decided the section of the proc transreg as 4, and analyzed.

1) Estimated the transpose linear regression

We explained with a theory that ‘The null hypothesis does not satisfy the regression model, but the alternative hypothesis does.’ As a result in the SAS, according to the null hypothesis, the F-value was 3093.13, and the P=value <.0001. Therefore at a significance level of 0.01, the null hypothesis is dismissible. In other words, the regression model is more suitable.

2) Estimated factor β0, β1

As the notable probability gets smaller, it can affect the dependent variable more. According to the null hypothesis, the F-value of β0 is 10.13, and the Pr > F 0.0015. So the null hypothesis is dismissable. And the estimated calculation is 1.22744031. Also F-value of β1 was 3093.13, and the Pr > F <.0001, so again the null hypothesis is dismissed. Therefore, the estimated number is 0.97546498.

3) Estimated the transe regression model

As a result to look the distribution of the strand to each species, we found a similar pattern. Thus we studied the pattern of the strand distribution after to range a standard by section which is changed the strand's sign. By using The Transpose Regression Method, we have a result to be able to express The Spline Regression Model by the distribution pattern of the strand of 82 species.

Identity(spercent) = 1.22744031 + 0.97546498*spline(interval)

The X axis shows the 100 sections of the genome of the randomly selected 77 species. And the Y axis is the ratio of the + patterns of each section. The sum of each section’s +, - pattern is 100. According to the graph above, when the standard number is 50, the + patterns appears as 25 on the left, and the – pattern on the right higher than 80. So in this case, it is a + pattern.

4) Verifying the estimated Prediction Equation is adequate.

By conducting the chi-square test with the estimated transpose linear regression prediction equation with the 77 randomly selected species, we verified if the equation is adequate. The null hypothesis is independent from the prediction equation and the other 5 species that was not selected. And the alternative hypothesis is subordinate with the prediction equation and the 5 species. The p-value of the 5 species is independent from the prediction equation estimated by the null hypothesis. We can see that it is subordinate when it is dismissed.

2-2. Region

We estimated the origin which is a part of changed strand’s pattern.

1) The spread of the essential genes is shown at the table below.

As a result to guess the distribution of the essential gene, a graph was showed like that. We know that 322 essential genes among 485 essential gene are distributed a bilateral symmetry in the middle of origin. We can divide between the Synb_ID which is the origin of high frequency and Synb_ID which is the origin of of high frequency by both sides.

2) The spread of the genes provided COG is shown at the table below.