Revision as of 18:41, 26 September 2012

2012 iGEM

CBNU-Korea

Minimal Genome Designer

- Method

1. Method

1-1. Selection of DB and Reasons

To provide various information on genome, we combined the databases from various sources. Basically we used information about the complete genome provided by NCBI. NCBI is useful because it not only provides information on the genome sequence, but also COG information on the function of the genome. We think that it is important to obtain a result that is not much different from that obtained with the experiment, so we also used the data of essential genes that DEG analyzed. In addition, to overcome the limitation of the classification of function that COG has, we also used the Gene Ontology information to show the information about the products of the gene. Because most of the selected data are related to PATRIC, we used PATRI data as a base.

1-2. Composition of DB

We established a database in two major sites. The two sites are NMPDR(http://www.patricbrc.org/portal/portal/patric/Home) and DEG(http://tubic.tju.edu.cn/deg/). Especially, in NMPDR it connects the database of NCBI and GO(http://www.geneontology.org/) with user's. In effect, it is constructed with 4 different databases. In the database of NMPDR or NCBI, the information on genome and gene sequence is provided. At DEG, the information on essential genes is disclosed with experiments. Lastly, the information on the function of genes is provided by GO.

1-3. Selection of the Subject of Analysis and the Reasons

1) Selection of the Subject of Analysis and the Reason

It is known that though we use a computer, it takes long time to find the gene that every species have in common. Also, we have found out that to establish our own experimental methods and analysis standard to gain the same result as the results from experiments, we need a specific species. So we selected the subject to be analyzed.

2) The Subject of Analysis

The subject to be analyze has to be the one whose accuracy of analysis pursued by can be checked. So there should be more than two species in the same genus whose essential genes have been revealed in vitro. Of the data provided by DEG which analyze essential genes with experiments, the genus that meet this requirement are Escherichia, Mycoplasma, Salmonella, Staphylococcus, Streptococcus. And we chose Streptococcus, considering the number of the specimen and the analysis time.

1-4. Selecting Methods of Analysis and Basis

1) The Method of Analysis

We conduct analysis to find essential genes following the sequence in the next flow chart.

2) The First Analysis(Determination of BLAST standard) and Reliability

As we emphasized earlier, we hope that there is no difference between the results we get and that in vitro results. So we are going to prove it in our first analysis. In our first analysis, we used Streptococcus pneumoniae TIGR4 and Streptococcus sanguinis SK36, as they are in the same Streptococcus genuses and essential genes are found with in vitro experimental methods. We verified reliability of the result and the accuracy of the analysis method by blasting the two data with our BLAST standards.

3) BLAST Analysis Results

We label the essential gene information produced by DEG as a (+), and the gene that is not produced as a (-). Also we label the genes that are thought to be essential genes according to our analysis as a (+), and the ones that don't a (-), and made a 2x2 cross diagram. In this, we are aware that though we analyze the same data by BLAST, the result can be different depending on the query. So we switch the two data on the diagram and analyze it repeatedly.

4) Verification of the credibility of BLAST Analysis

4-1) Verification of the BLAST Analysis Standard

To verify reliability of the standards in BLAST, we use Sensitivity, Specificity, and Accuracy. 'Sensitivity' is the probability that essential genes of DEG are analyzed as essential gene also in our analysis result. 'Specificity' is the probability that non-essential genes of DEG are actually analyzed as the same in our result. And 'Accuracy' is the probability that shows the degree of correspondence between the results in the whole specimen. If the Accuracy value is over 80%, it means that the results made by BLAST Analysis Standard is reliable. Therefore, the results that we infer are reliable as well. Then we conduct Likelihood test to check the validity of the Accuracy to show that Accuracy can analyze accurate reliability.

4-2) Verification of the reliability of BLAST Analysis Results

To procure the reliability of analysis result, we conduct the McNemar test.By this, we can be sure that the analysis result with experiments is not much different from that with our analysis.

4-3) The Second Analysis (Annotation)

We applied the BLAST standards confirmed by the first analysis to 82 Complete Genome in the Streptococcus. We grouped the genes with the similar sequence using the BLAST result and annotate with our own ID. That is, the genes annotated with the same ID are regarded as the same genes. At this time, we call the ID that we give in our own way a 'Synb UID'. If all the 82 total genome have a specific Synb UID, then we infer them as an 'Essential Gene'. On the other hand, if the Synb UID was found in only one genome, we named it a 'Specific Gene'. From a result of the second analysis, the total essential genes in 82 species of Streptococcus are about 478 .

@@ Line 1: / Line 1: @@
 {{Team:CBNU-Korea/css_new}}
 <html>
 <head>
@@ Line 9: / Line 8: @@
 }
-.MGD_Overview {
+.MGD_Method {
 	font-weight: 900;
 }
@@ Line 20: / Line 19: @@
 #CB_sub_img {
-	height: 140px;
+	background-image:
-	background:
 		url(https://static.igem.org/mediawiki/2012/3/3b/CBK_sub_img_005.png);
-	background-size: 100% 100%;
 }
 </style>
@@ Line 71: / Line 68: @@
 			<div id="CB_sub_title">
 				<h2>Minimal Genome Designer</h2>
-				<h3>- Overview</h3>
+				<h3>- Method</h3>
 			</div>
 			<div id="CB_sub_menu">
@@ Line 82: / Line 79: @@
 							+ Brick Designer</a></li>
 					<li class="MGD"><a href="#">03 + Minimal Genome Designer</a></li>
-					<li id="li_sub" class="MGD_Overview"><a href="#">-
+					<li id="li_sub" class="MGD_Overview"><a
+						href="https://2012.igem.org/Team:CBNU-Korea/Project/GD/Overview">-
 							Overview</a></li>
-					<li id="li_sub" class="MGD_Method"><a
+					<li id="li_sub" class="MGD_Method"><a href="#">- Method</a></li>
-						href="https://2012.igem.org/Team:CBNU-Korea/Project/GD/Method">-
-							Method</a></li>
 					<li id="li_sub" class="MGD_Analysis"><a
 						href="https://2012.igem.org/Team:CBNU-Korea/Project/GD/Analysis">-
@@ Line 105: / Line 101: @@
 				</div>
 				<div id="CB_sub_cont">
-					<h1>1. Overview</h1>
+					<h1>1. Method</h1>
-					<h2>1-1. Predicting Essential Genes</h2>
+					<h2>1-1. Selection of DB and Reasons</h2>
+					<p>To provide various information on genome, we combined the
+						databases from various sources. Basically we used information
+						about the complete genome provided by NCBI. NCBI is useful because
+						it not only provides information on the genome sequence, but also
+						COG information on the function of the genome. We think that it is
+						important to obtain a result that is not much different from that
+						obtained with the experiment, so we also used the data of
+						essential genes that DEG analyzed. In addition, to overcome the
+						limitation of the classification of function that COG has, we also
+						used the Gene Ontology information to show the information about
+						the products of the gene. Because most of the selected data are
+						related to PATRIC, we used PATRI data as a base.</p>
+					<h2>1-2. Composition of DB</h2>
+					<p>We established a database in two major sites. The two sites
+						are NMPDR(http://www.patricbrc.org/portal/portal/patric/Home) and
+						DEG(http://tubic.tju.edu.cn/deg/). Especially, in NMPDR it
+						connects the database of NCBI and GO(http://www.geneontology.org/)
+						with user's. In effect, it is constructed with 4 different
+						databases. In the database of NMPDR or NCBI, the information on
+						genome and gene sequence is provided. At DEG, the information on
+						essential genes is disclosed with experiments. Lastly, the
+						information on the function of genes is provided by GO.</p>
+					<h2>1-3. Selection of the Subject of Analysis and the Reasons</h2>
 					<p>
-						<strong>1) Need of the Analysis of Essential Genes</strong>
+						<strong>1) Selection of the Subject of Analysis and the
+							Reason</strong>
 					</p>
-					<p>It is thought that today, Synthetic Biology has reached a
-						plateau. Since the success of the experiment that re-synthesizes
+					<p>It is known that though we use a computer, it takes long
-						the genome and insert it into the cell, it seems that Synthetic
+						time to find the gene that every species have in common. Also, we
-						Biology is not developing. But if somebody can design a genome and
+						have found out that to establish our own experimental methods and
-						synthesize it, synthetic biology can take off again. Then what do
+						analysis standard to gain the same result as the results from
-						we need to design a genome? First, we need information about
+						experiments, we need a specific species. So we selected the
-						essential genes. An essential gene is a gene that is critical for
+						subject to be analyzed.</p>
-						survival. If you have the information about the essential gene,
-						you are in a superior state in making human artificial cell in a
-						true sense. Because it means that you already have the ability to
-						make the brick. All you have to do is make a brick with our
-						program developed for you according to the sequence of the
-						essential genes, and conduct experiments.</p>
 					<p>
-						<strong>2) The Present of the Analysis of Essential Genes</strong>
+						<strong>2) The Subject of Analysis </strong>
 					</p>
-					<p>Essential genes are being analyzed in many places, such as
-						in DEG or PATRIC.</p>
+					<p>The subject to be analyze has to be the one whose accuracy
-					<img src="https://static.igem.org/mediawiki/2012/1/11/CBK_B_001.png">
+						of analysis pursued by can be checked. So there should be more
+						than two species in the same genus whose essential genes have been
+						revealed in vitro. Of the data provided by DEG which analyze
+						essential genes with experiments, the genus that meet this
+						requirement are Escherichia, Mycoplasma, Salmonella,
+						Staphylococcus, Streptococcus. And we chose Streptococcus,
+						considering the number of the specimen and the analysis time.</p>
+					<h2>1-4. Selecting Methods of Analysis and Basis</h2>
 					<p>
-						<strong>3) The Problem of the Analysis of Essential Genes</strong>
+						<strong>1) The Method of Analysis</strong>
 					</p>
-					<p>Essential genes known today are mostly discovered with in
-						vitro experiments. As seen in the chart above, the result of the
-						experiment can vary according to conditions and methods. We bring
-						forth a problem about this. We believe that to give meaning to a
-						result, it has to be accurate. Also the analysis method of
-						essential genes used today takes a lot of time and labor. To
-						address this we use a bio-informatics way and find essential
-						genes.</p>
-					<h2>1-2. Analysis of Essential Genes</h2>
+					<p>We conduct analysis to find essential genes following the
+						sequence in the next flow chart.</p>
+					<img
+						src="https://static.igem.org/mediawiki/2012/thumb/b/b9/CBK_B_002.png/354px-CBK_B_002.png">
 					<p>
-						<strong>1) Definition of Essential Genes</strong>
+						<strong>2) The First Analysis(Determination of BLAST
+							standard) and Reliability </strong>
 					</p>
-					<p>As we are using the computer to find essential genes, we
-						can't use the existing method to analogize essential genes with
+					<p>As we emphasized earlier, we hope that there is no
-						experiments. We assume that if essential genes are critical to
+						difference between the results we get and that in vitro results.
-						living, every organism must have it. So we approached the problem
+						So we are going to prove it in our first analysis. In our first
-						with the assumption that an essential gene is a gene that every
+						analysis, we used Streptococcus pneumoniae TIGR4 and Streptococcus
-						living thing will has.</p>
+						sanguinis SK36, as they are in the same Streptococcus genuses and
-					<p>
+						essential genes are found with in vitro experimental methods. We
-						<strong>2) Significance of the Analysis of Essential
+						verified reliability of the result and the accuracy of the
-							Genes</strong>
+						analysis method by blasting the two data with our BLAST standards.
 					</p>
-					<p>We are proud of our analysis methods. You will see that our
-						analysis result is not that different from that proved with
-						experiments. Our analysis results do more than finding essential
-						genes. With this, you can understand the metabolic process and
-						furthermore have a good chance to synthesize an artificial cell.</p>
-					<h2>1-3. Developing of the Minimal Genome Designer</h2>
 					<p>
-						<strong>1) The Purpose of Minimal Genome Designer</strong>
+						<strong>3) BLAST Analysis Results</strong>
 					</p>
-					<p>The purpose of our program is fundamentally to understand
-						the structure and the principle of the genome. However, we will
+					<p>We label the essential gene information produced by DEG as a
-						not stop at understanding the information about genome, but hope
+						(+), and the gene that is not produced as a (-). Also we label the
-						that you can go on to build your own genome by using the brick in
+						genes that are thought to be essential genes according to our
-						your experiment.</p>
+						analysis as a (+), and the ones that don't a (-), and made a 2x2
+						cross diagram. In this, we are aware that though we analyze the
+						same data by BLAST, the result can be different depending on the
+						query. So we switch the two data on the diagram and analyze it
+						repeatedly.</p>
 					<p>
-						<strong>2) Advantages of Minimal Genome Designer</strong>
+						<strong>4) Verification of the credibility of BLAST
+							Analysis</strong>
 					</p>
-					<p>Minimal Genome Designer will make you easily understand the
-						genome, by showing you the structure. Providing various
-						information about the gene and the genome, our program will
-						shorten your time to design the experiment. This also means that
-						you can save money as well. We have tried and verified so that
-						Minimal Genome Designer can procure reliability. The results made
-						by Minimal Genome Designer can be used as a new background
-						information for your experiment.</p>
-					<div id="scrolltotop"></div>
-					<h1>2. Function</h1>
+					<p>
+						<i>4-1) Verification of the BLAST Analysis Standard</i>
-					<h2>2-1. Viewer</h2>
-					<p>1) The program is installed and run in a local computer. The
-						local database isn't included in the program, and to see the
-						database you should have access to the server that we constructed.
 					</p>
-					<p>2) The program consists of the circular viewer on the upper
-						left, and the linear viewer below, and a Genome list on the right.</p>
-					<p>3) First, select a gene in the Genome List on the right. You
-						may select it on the Tree View, or by searching the gene that you
-						want. After selection, the information of the gene will be shown
-						in graphic. You can see more information about the relevant gene
-						and the genome on the Genome Information in the middle.</p>
-					<p>4) If you move the reading glass image of the circular
-						viewer, you can see the relevant location in linear graphic. By
-						using the linear scroll bar, you can change the linear location.
-						If you click the relevant square image in the linear viewer, you
-						can see more information about the gene(Like sequences, length,
-						location, Synb_id, function, product and more).</p>
-					<p>5) With the check box on the lower right-hand corner of the
-						circular viewer, you can check the gene of your choice.</p>
-					<p>6) By checking the By COG check box, you can see each of the
-						COG functions of each gene. Choose the COG function in the scroll
-						box that appears after you check, then you can see the genes
-						including that function.</p>
-					<p>7) If you check the DEG Only check box, you can see the
-						informations that DEG has found in vitro in visual.</p>
-					<p>8) If you check the EG Only check box, you can see the
-						information on essential genes that we gain through analysis in
-						visual.</p>
-					<p>9) Select 'History' in the Genome List menu, then you can
-						see all the genes that have seen all along easily.</p>
-					<h2>2-2. Unique Genome Designer</h2>
+					<p>To verify reliability of the standards in BLAST, we use
+						Sensitivity, Specificity, and Accuracy. 'Sensitivity' is the
+						probability that essential genes of DEG are analyzed as essential
+						gene also in our analysis result. 'Specificity' is the probability
+						that non-essential genes of DEG are actually analyzed as the same
+						in our result. And 'Accuracy' is the probability that shows the
+						degree of correspondence between the results in the whole
+						specimen. If the Accuracy value is over 80%, it means that the
+						results made by BLAST Analysis Standard is reliable. Therefore,
+						the results that we infer are reliable as well. Then we conduct
+						Likelihood test to check the validity of the Accuracy to show that
+						Accuracy can analyze accurate reliability.</p>
-					<p>1) Shows the information of essential genes that can exist
+					<p>
-						in each section.(Function, Product, COG number etc.)</p>
+						<i>4-2) Verification of the reliability of BLAST Analysis
-					<p>2) Shows the frequency of the essential genes in each
+							Results</i>
-						section that are analyzed using 82 different species.</p>
-					<p>3) Shows the frequency of the COG of each section that is
-						analyzed using 82 different species.</p>
-					<p>4) By choosing each section of 20 in the screen, the
-						essential genes that can be inserted in the relevant area will be
-						listed.</p>
-					<p>5) The user can put the wanted essential gene In each
-						section and see the processing situation. When the whole 478
-						essential genes are put in all sections, the design is completed.
 					</p>
-					<p>7) The user can save and import the designing situation as a
-						XML form at their convenience.</p>
-					<p>8) When the user complete the design, the designed minimal
-						genome can be saved as a XML form and it could be viewed at
-						Viewer.</p>
-					<div id="scrolltotop"></div>
-					<h1>3. Future Work</h1>
+					<p>To procure the reliability of analysis result, we conduct
+						the McNemar test.By this, we can be sure that the analysis result
+						with experiments is not much different from that with our
+						analysis.</p>
 					<p>
-						Our ultimate objective is to make a program that can build a
+						<i> 4-3) The Second Analysis (Annotation)</i>
-						complete minimum genome. Well do we know the information that we
-						need to realize our goal. First, we analogize essential genes that
-						every species is supposed to have. And we are aware of the fact
-						that to achieve this goal, our analysis method needs higher
-						reliability. Therefore, we are going to use the same method on a
-						different genus (For example, like Mycoplasma, Salmonella, that
-						the essential gene is already identified experimentally), and
-						secure the reliability of our analysis method. Secondly, we
-						presume specific genes which reflect the characteristic of genome.
-						We think that there are genes that the function of it can
-						represent each genome or each level. In this year we only analyzed
-						every genome in Streptococcus, so we cannot provide information of
-						specific genes. But we expect that the specific genes that we
-						analyzed in this year not only make our software more wealthy but
-						also make accurate result. Thirdly, we presume accurate
-						arrangement of essential gene that we analyzed. Because we mostly
-						concentrated on the analysis of strand pattern, it isn’t enough to
-						provide accurate arrangement. However we know that we need the
-						information of the order of genes to make complete design
-						software. Thus we are going to analyze this and improve the
-						completeness of our Designer.<br> We believe that if this
-						analysis becomes perfect, our analysis method will secure
-						reliability and give you the accurate outcome. Also hoping that
-						you will use our program more actively, we are planning to provide
-						information you will need(For example, restriction enzyme sites,
-						Gene map, the culture conditions of different species). We hope
-						our program Minimal Genome Designer will provide the fundamental
-						information for your experiment in the near future.
 					</p>
+					<p>We applied the BLAST standards confirmed by the first
+						analysis to 82 Complete Genome in the Streptococcus. We grouped
+						the genes with the similar sequence using the BLAST result and
+						annotate with our own ID. That is, the genes annotated with the
+						same ID are regarded as the same genes. At this time, we call the
+						ID that we give in our own way a 'Synb UID'. If all the 82 total
+						genome have a specific Synb UID, then we infer them as an
+						'Essential Gene'. On the other hand, if the Synb UID was found in
+						only one genome, we named it a 'Specific Gene'. From a result of
+						the second analysis, the total essential genes in 82 species of
+						Streptococcus are about 478 .</p>
 					<div id="scrolltotop"></div>
 				</div>