Team:CBNU-Korea/Project/GD/Method
From 2012.igem.org
Line 193: | Line 193: | ||
query. So we switch the two data on the diagram and analyze it | query. So we switch the two data on the diagram and analyze it | ||
repeatedly.</p> | repeatedly.</p> | ||
+ | |||
+ | <p>CASE 1) DB : Streptococcus pneumoniae TIGR4 / Query : | ||
+ | Streptococcus sanguinis SK36</p> | ||
+ | <img src="https://static.igem.org/mediawiki/igem.org/6/68/CBK_B_003.png"> | ||
+ | <p>CASE 2) DB : Streptococcus sanguinis SK36 / Qurey : | ||
+ | Streptococcus pneumoniae TIGR4</p> | ||
+ | <img src="https://static.igem.org/mediawiki/igem.org/e/ea/CBK_B_004.png"> <img | ||
+ | src="https://static.igem.org/mediawiki/igem.org/6/60/CBK_B_005.png"> | ||
<p> | <p> | ||
Line 210: | Line 218: | ||
in our result. And 'Accuracy' is the probability that shows the | in our result. And 'Accuracy' is the probability that shows the | ||
degree of correspondence between the results in the whole | degree of correspondence between the results in the whole | ||
- | specimen. If the Accuracy value is over 80%, it means that the | + | specimen.</p> |
- | + | <img src="https://static.igem.org/mediawiki/igem.org/f/fa/CBK_B_006.png"> | |
- | + | <p>If the Accuracy value is over 80%, it means that the results | |
- | + | made by BLAST Analysis Standard is reliable. Therefore, the | |
- | + | results that we infer are reliable as well.</p> | |
+ | <img src="https://static.igem.org/mediawiki/igem.org/5/5b/CBK_B_007.png"> | ||
+ | <p>Then we conduct Likelihood test to check the validity of the | ||
+ | Accuracy to show that Accuracy can analyze accurate reliability.</p> | ||
<p> | <p> | ||
Line 220: | Line 231: | ||
Results</i> | Results</i> | ||
</p> | </p> | ||
- | + | <img src="https://static.igem.org/mediawiki/igem.org/7/7f/CBK_B_009.png"> | |
<p>To procure the reliability of analysis result, we conduct | <p>To procure the reliability of analysis result, we conduct | ||
the McNemar test.By this, we can be sure that the analysis result | the McNemar test.By this, we can be sure that the analysis result | ||
Line 234: | Line 245: | ||
annotate with our own ID. That is, the genes annotated with the | annotate with our own ID. That is, the genes annotated with the | ||
same ID are regarded as the same genes. At this time, we call the | same ID are regarded as the same genes. At this time, we call the | ||
- | ID that we give in our own way a 'Synb UID'. If all the 82 total | + | ID that we give in our own way a 'Synb UID'.</p> |
- | + | <img src="https://static.igem.org/mediawiki/igem.org/5/53/CBK_B_010.png"> | |
- | + | <p>If all the 82 total genome have a specific Synb UID, then we | |
- | + | infer them as an 'Essential Gene'. On the other hand, if the Synb | |
- | + | UID was found in only one genome, we named it a 'Specific Gene'. | |
- | + | From a result of the second analysis, the total essential genes in | |
+ | 82 species of Streptococcus are about 478 .</p> | ||
+ | <img src=""> | ||
<div id="scrolltotop"></div> | <div id="scrolltotop"></div> | ||
</div> | </div> |
Revision as of 18:49, 26 September 2012
Minimal Genome Designer
- Method
1. Method
1-1. Selection of DB and Reasons
To provide various information on genome, we combined the databases from various sources. Basically we used information about the complete genome provided by NCBI. NCBI is useful because it not only provides information on the genome sequence, but also COG information on the function of the genome. We think that it is important to obtain a result that is not much different from that obtained with the experiment, so we also used the data of essential genes that DEG analyzed. In addition, to overcome the limitation of the classification of function that COG has, we also used the Gene Ontology information to show the information about the products of the gene. Because most of the selected data are related to PATRIC, we used PATRI data as a base.
1-2. Composition of DB
We established a database in two major sites. The two sites are NMPDR(http://www.patricbrc.org/portal/portal/patric/Home) and DEG(http://tubic.tju.edu.cn/deg/). Especially, in NMPDR it connects the database of NCBI and GO(http://www.geneontology.org/) with user's. In effect, it is constructed with 4 different databases. In the database of NMPDR or NCBI, the information on genome and gene sequence is provided. At DEG, the information on essential genes is disclosed with experiments. Lastly, the information on the function of genes is provided by GO.
1-3. Selection of the Subject of Analysis and the Reasons
1) Selection of the Subject of Analysis and the Reason
It is known that though we use a computer, it takes long time to find the gene that every species have in common. Also, we have found out that to establish our own experimental methods and analysis standard to gain the same result as the results from experiments, we need a specific species. So we selected the subject to be analyzed.
2) The Subject of Analysis
The subject to be analyze has to be the one whose accuracy of analysis pursued by can be checked. So there should be more than two species in the same genus whose essential genes have been revealed in vitro. Of the data provided by DEG which analyze essential genes with experiments, the genus that meet this requirement are Escherichia, Mycoplasma, Salmonella, Staphylococcus, Streptococcus. And we chose Streptococcus, considering the number of the specimen and the analysis time.
1-4. Selecting Methods of Analysis and Basis
1) The Method of Analysis
We conduct analysis to find essential genes following the sequence in the next flow chart.
2) The First Analysis(Determination of BLAST standard) and Reliability
As we emphasized earlier, we hope that there is no difference between the results we get and that in vitro results. So we are going to prove it in our first analysis. In our first analysis, we used Streptococcus pneumoniae TIGR4 and Streptococcus sanguinis SK36, as they are in the same Streptococcus genuses and essential genes are found with in vitro experimental methods. We verified reliability of the result and the accuracy of the analysis method by blasting the two data with our BLAST standards.
3) BLAST Analysis Results
We label the essential gene information produced by DEG as a (+), and the gene that is not produced as a (-). Also we label the genes that are thought to be essential genes according to our analysis as a (+), and the ones that don't a (-), and made a 2x2 cross diagram. In this, we are aware that though we analyze the same data by BLAST, the result can be different depending on the query. So we switch the two data on the diagram and analyze it repeatedly.
CASE 1) DB : Streptococcus pneumoniae TIGR4 / Query : Streptococcus sanguinis SK36
CASE 2) DB : Streptococcus sanguinis SK36 / Qurey : Streptococcus pneumoniae TIGR4
4) Verification of the credibility of BLAST Analysis
4-1) Verification of the BLAST Analysis Standard
To verify reliability of the standards in BLAST, we use Sensitivity, Specificity, and Accuracy. 'Sensitivity' is the probability that essential genes of DEG are analyzed as essential gene also in our analysis result. 'Specificity' is the probability that non-essential genes of DEG are actually analyzed as the same in our result. And 'Accuracy' is the probability that shows the degree of correspondence between the results in the whole specimen.
If the Accuracy value is over 80%, it means that the results made by BLAST Analysis Standard is reliable. Therefore, the results that we infer are reliable as well.
Then we conduct Likelihood test to check the validity of the Accuracy to show that Accuracy can analyze accurate reliability.
4-2) Verification of the reliability of BLAST Analysis Results
To procure the reliability of analysis result, we conduct the McNemar test.By this, we can be sure that the analysis result with experiments is not much different from that with our analysis.
4-3) The Second Analysis (Annotation)
We applied the BLAST standards confirmed by the first analysis to 82 Complete Genome in the Streptococcus. We grouped the genes with the similar sequence using the BLAST result and annotate with our own ID. That is, the genes annotated with the same ID are regarded as the same genes. At this time, we call the ID that we give in our own way a 'Synb UID'.
If all the 82 total genome have a specific Synb UID, then we infer them as an 'Essential Gene'. On the other hand, if the Synb UID was found in only one genome, we named it a 'Specific Gene'. From a result of the second analysis, the total essential genes in 82 species of Streptococcus are about 478 .