Team:CBNU-Korea/Project/GD/Analysis
From 2012.igem.org
(Created page with "{{Team:CBNU-Korea/css_new}} <html> <head> <style> .PROJECT { border-bottom-color: white !important; } .MGD_Analysis { font-weight: 900; } .MGD a { font-weight: 900; color...") |
|||
Line 105: | Line 105: | ||
</div> | </div> | ||
<div id="CB_sub_cont"> | <div id="CB_sub_cont"> | ||
- | <h1>1. | + | <h1>1. Introduction</h1> |
- | <h2>1-1. | + | <h2>1-1. Suggestion</h2> |
+ | |||
+ | <p>Since the Genome project started in 2002, we can easily get | ||
+ | the genetic information of many species. Also as the scientific | ||
+ | technique developed, we can insert and compose the genome. If we | ||
+ | can design a whole genome, then we will be able to make a one and | ||
+ | only useful genome. But as today, the compose of the minimum | ||
+ | genome made with the essential gene has succeeded, but did not | ||
+ | last.</p> | ||
+ | <h2>1-2. Object</h2> | ||
+ | <p>To design a genome, we have to analyze the pattern of the | ||
+ | genome and the distribution of the gene.</p> | ||
+ | |||
+ | <h2>1-3. Method and the range of the study</h2> | ||
<p> | <p> | ||
- | < | + | The study was used information of species in streptococcus by |
+ | patric database (<a | ||
+ | href="http://www.patricbrc.org/portal/portal/patric/Home"> | ||
+ | http://www.patricbrc.org/portal/portal/patric/Home</a>) and SynbUID.<br> | ||
+ | The Data was built by mysql 5.5.27, and a statistical analysis | ||
+ | program was used by SAS 9.3. | ||
</p> | </p> | ||
- | < | + | <div id="scrolltotop"></div> |
- | + | ||
- | + | <h1>2. Design</h1> | |
- | + | <h2>2-1. Prepare</h2> | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
<p> | <p> | ||
- | <strong> | + | <strong>1) Build database</strong> |
</p> | </p> | ||
- | |||
- | |||
- | |||
<p> | <p> | ||
- | + | An attribute of Genome name is consisted of ID, Genome_name, COG, | |
+ | Start, End, Strand, and Size.<br> An attribute of Annotation | ||
+ | Table_EG is consisted of ID, locus, and SynbUID. Two entities are | ||
+ | paired of Locus_tag 1 by 1. | ||
</p> | </p> | ||
- | < | + | <img src=""> |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | |||
<p> | <p> | ||
- | <strong> | + | <strong>2) Represented sample number</strong> |
</p> | </p> | ||
- | <p> | + | <p>For checking the number of specimen that is representative, |
- | + | we used a simple random sampling method, and assumed that the | |
- | + | complete genome is random. We used the significance level (a=0.05) | |
- | + | and the limit of error (b=0.1). The total species of streptococcus | |
- | + | is 494 species, and between these, 82 species are completed. | |
- | + | According to our calculation, when there is 81 species, the result | |
- | + | is satisfied. Therefore, as a result, 82 complete species | |
- | + | represent the streptococcus.</p> | |
- | + | <img src=""> | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | + | ||
- | |||
<p> | <p> | ||
- | <strong> | + | <strong>3) Standard</strong> |
</p> | </p> | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | < | + | <p>3-1) Divided the interval of the genome</p> |
+ | <p>The number and size of the genome differs between species. | ||
+ | To supplement this problem, we divided the genes in a section to | ||
+ | show the genome’s size as a proportion. As a result, when we | ||
+ | divided the analyzing section less then a hundred, it was hard to | ||
+ | see the patterns because the data has been diluted. And when we | ||
+ | divided it into more then a hundred pieces, it was not that | ||
+ | different from the result that divided it into a hundred pieces. | ||
+ | So we decided to divide it into a hundred pieces.</p> | ||
- | < | + | <p>3-2) Identified the starting point</p> |
- | + | <p>The number one ORF of each gene sequence analysis data is | |
- | <p> | + | different between every species. Thus we had to make a specific |
- | + | standard to equalize the beginning of the data. We checked the | |
- | + | strand pattern of each genome and identified it with the strands. | |
</p> | </p> | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
</div> | </div> | ||
</div> | </div> |
Revision as of 19:36, 26 September 2012
Minimal Genome Designer
- Analysis
1. Introduction
1-1. Suggestion
Since the Genome project started in 2002, we can easily get the genetic information of many species. Also as the scientific technique developed, we can insert and compose the genome. If we can design a whole genome, then we will be able to make a one and only useful genome. But as today, the compose of the minimum genome made with the essential gene has succeeded, but did not last.
1-2. Object
To design a genome, we have to analyze the pattern of the genome and the distribution of the gene.
1-3. Method and the range of the study
The study was used information of species in streptococcus by
patric database (
http://www.patricbrc.org/portal/portal/patric/Home) and SynbUID.
The Data was built by mysql 5.5.27, and a statistical analysis
program was used by SAS 9.3.
2. Design
2-1. Prepare
1) Build database
An attribute of Genome name is consisted of ID, Genome_name, COG,
Start, End, Strand, and Size.
An attribute of Annotation
Table_EG is consisted of ID, locus, and SynbUID. Two entities are
paired of Locus_tag 1 by 1.
2) Represented sample number
For checking the number of specimen that is representative, we used a simple random sampling method, and assumed that the complete genome is random. We used the significance level (a=0.05) and the limit of error (b=0.1). The total species of streptococcus is 494 species, and between these, 82 species are completed. According to our calculation, when there is 81 species, the result is satisfied. Therefore, as a result, 82 complete species represent the streptococcus.
3) Standard
3-1) Divided the interval of the genome
The number and size of the genome differs between species. To supplement this problem, we divided the genes in a section to show the genome’s size as a proportion. As a result, when we divided the analyzing section less then a hundred, it was hard to see the patterns because the data has been diluted. And when we divided it into more then a hundred pieces, it was not that different from the result that divided it into a hundred pieces. So we decided to divide it into a hundred pieces.
3-2) Identified the starting point
The number one ORF of each gene sequence analysis data is different between every species. Thus we had to make a specific standard to equalize the beginning of the data. We checked the strand pattern of each genome and identified it with the strands.