Team:CBNU-Korea/Project/GD/Analysis

From 2012.igem.org

(Difference between revisions)
(Created page with "{{Team:CBNU-Korea/css_new}} <html> <head> <style> .PROJECT { border-bottom-color: white !important; } .MGD_Analysis { font-weight: 900; } .MGD a { font-weight: 900; color...")
Line 105: Line 105:
</div>
</div>
<div id="CB_sub_cont">
<div id="CB_sub_cont">
-
<h1>1. Overview</h1>
+
<h1>1. Introduction</h1>
-
<h2>1-1. Predicting Essential Genes</h2>
+
<h2>1-1. Suggestion</h2>
 +
 
 +
<p>Since the Genome project started in 2002, we can easily get
 +
the genetic information of many species. Also as the scientific
 +
technique developed, we can insert and compose the genome. If we
 +
can design a whole genome, then we will be able to make a one and
 +
only useful genome. But as today, the compose of the minimum
 +
genome made with the essential gene has succeeded, but did not
 +
last.</p>
 +
<h2>1-2. Object</h2>
 +
<p>To design a genome, we have to analyze the pattern of the
 +
genome and the distribution of the gene.</p>
 +
 
 +
<h2>1-3. Method and the range of the study</h2>
<p>
<p>
-
<strong>1) Need of the Analysis of Essential Genes</strong>
+
The study was used information of species in streptococcus by
 +
patric database (<a
 +
href="http://www.patricbrc.org/portal/portal/patric/Home">
 +
http://www.patricbrc.org/portal/portal/patric/Home</a>) and SynbUID.<br>
 +
The Data was built by mysql 5.5.27, and a statistical analysis
 +
program was used by SAS 9.3.
</p>
</p>
-
<p>It is thought that today, Synthetic Biology has reached a
+
<div id="scrolltotop"></div>
-
plateau. Since the success of the experiment that re-synthesizes
+
 
-
the genome and insert it into the cell, it seems that Synthetic
+
<h1>2. Design</h1>
-
Biology is not developing. But if somebody can design a genome and
+
<h2>2-1. Prepare</h2>
-
synthesize it, synthetic biology can take off again. Then what do
+
-
we need to design a genome? First, we need information about
+
-
essential genes. An essential gene is a gene that is critical for
+
-
survival. If you have the information about the essential gene,
+
-
you are in a superior state in making human artificial cell in a
+
-
true sense. Because it means that you already have the ability to
+
-
make the brick. All you have to do is make a brick with our
+
-
program developed for you according to the sequence of the
+
-
essential genes, and conduct experiments.</p>
+
<p>
<p>
-
<strong>2) The Present of the Analysis of Essential Genes</strong>
+
<strong>1) Build database</strong>
</p>
</p>
-
<p>Essential genes are being analyzed in many places, such as
 
-
in DEG or PATRIC.</p>
 
-
<img src="https://static.igem.org/mediawiki/2012/1/11/CBK_B_001.png">
 
<p>
<p>
-
<strong>3) The Problem of the Analysis of Essential Genes</strong>
+
An attribute of Genome name is consisted of ID, Genome_name, COG,
 +
Start, End, Strand, and Size.<br> An attribute of Annotation
 +
Table_EG is consisted of ID, locus, and SynbUID. Two entities are
 +
paired of Locus_tag 1 by 1.
</p>
</p>
-
<p>Essential genes known today are mostly discovered with in
+
<img src="">
-
vitro experiments. As seen in the chart above, the result of the
+
-
experiment can vary according to conditions and methods. We bring
+
-
forth a problem about this. We believe that to give meaning to a
+
-
result, it has to be accurate. Also the analysis method of
+
-
essential genes used today takes a lot of time and labor. To
+
-
address this we use a bio-informatics way and find essential
+
-
genes.</p>
+
-
<h2>1-2. Analysis of Essential Genes</h2>
 
<p>
<p>
-
<strong>1) Definition of Essential Genes</strong>
+
<strong>2) Represented sample number</strong>
</p>
</p>
-
<p>As we are using the computer to find essential genes, we
+
<p>For checking the number of specimen that is representative,
-
can't use the existing method to analogize essential genes with
+
we used a simple random sampling method, and assumed that the
-
experiments. We assume that if essential genes are critical to
+
complete genome is random. We used the significance level (a=0.05)
-
living, every organism must have it. So we approached the problem
+
and the limit of error (b=0.1). The total species of streptococcus
-
with the assumption that an essential gene is a gene that every
+
is 494 species, and between these, 82 species are completed.
-
living thing will has.</p>
+
According to our calculation, when there is 81 species, the result
-
<p>
+
is satisfied. Therefore, as a result, 82 complete species
-
<strong>2) Significance of the Analysis of Essential
+
represent the streptococcus.</p>
-
Genes</strong>
+
<img src="">
-
</p>
+
-
<p>We are proud of our analysis methods. You will see that our
+
-
analysis result is not that different from that proved with
+
-
experiments. Our analysis results do more than finding essential
+
-
genes. With this, you can understand the metabolic process and
+
-
furthermore have a good chance to synthesize an artificial cell.</p>
+
-
<h2>1-3. Developing of the Minimal Genome Designer</h2>
 
<p>
<p>
-
<strong>1) The Purpose of Minimal Genome Designer</strong>
+
<strong>3) Standard</strong>
</p>
</p>
-
<p>The purpose of our program is fundamentally to understand
 
-
the structure and the principle of the genome. However, we will
 
-
not stop at understanding the information about genome, but hope
 
-
that you can go on to build your own genome by using the brick in
 
-
your experiment.</p>
 
-
<p>
 
-
<strong>2) Advantages of Minimal Genome Designer</strong>
 
-
</p>
 
-
<p>Minimal Genome Designer will make you easily understand the
 
-
genome, by showing you the structure. Providing various
 
-
information about the gene and the genome, our program will
 
-
shorten your time to design the experiment. This also means that
 
-
you can save money as well. We have tried and verified so that
 
-
Minimal Genome Designer can procure reliability. The results made
 
-
by Minimal Genome Designer can be used as a new background
 
-
information for your experiment.</p>
 
-
<div id="scrolltotop"></div>
 
-
<h1>2. Function</h1>
+
<p>3-1) Divided the interval of the genome</p>
 +
<p>The number and size of the genome differs between species.
 +
To supplement this problem, we divided the genes in a section to
 +
show the genome’s size as a proportion. As a result, when we
 +
divided the analyzing section less then a hundred, it was hard to
 +
see the patterns because the data has been diluted. And when we
 +
divided it into more then a hundred pieces, it was not that
 +
different from the result that divided it into a hundred pieces.
 +
So we decided to divide it into a hundred pieces.</p>
-
<h2>2-1. Viewer</h2>
+
<p>3-2) Identified the starting point</p>
-
 
+
<p>The number one ORF of each gene sequence analysis data is
-
<p>1) The program is installed and run in a local computer. The
+
different between every species. Thus we had to make a specific
-
local database isn't included in the program, and to see the
+
standard to equalize the beginning of the data. We checked the
-
database you should have access to the server that we constructed.
+
strand pattern of each genome and identified it with the strands.
</p>
</p>
-
<p>2) The program consists of the circular viewer on the upper
 
-
left, and the linear viewer below, and a Genome list on the right.</p>
 
-
<p>3) First, select a gene in the Genome List on the right. You
 
-
may select it on the Tree View, or by searching the gene that you
 
-
want. After selection, the information of the gene will be shown
 
-
in graphic. You can see more information about the relevant gene
 
-
and the genome on the Genome Information in the middle.</p>
 
-
<p>4) If you move the reading glass image of the circular
 
-
viewer, you can see the relevant location in linear graphic. By
 
-
using the linear scroll bar, you can change the linear location.
 
-
If you click the relevant square image in the linear viewer, you
 
-
can see more information about the gene(Like sequences, length,
 
-
location, Synb_id, function, product and more).</p>
 
-
<p>5) With the check box on the lower right-hand corner of the
 
-
circular viewer, you can check the gene of your choice.</p>
 
-
<p>6) By checking the By COG check box, you can see each of the
 
-
COG functions of each gene. Choose the COG function in the scroll
 
-
box that appears after you check, then you can see the genes
 
-
including that function.</p>
 
-
<p>7) If you check the DEG Only check box, you can see the
 
-
informations that DEG has found in vitro in visual.</p>
 
-
<p>8) If you check the EG Only check box, you can see the
 
-
information on essential genes that we gain through analysis in
 
-
visual.</p>
 
-
<p>9) Select 'History' in the Genome List menu, then you can
 
-
see all the genes that have seen all along easily.</p>
 
-
 
-
<h2>2-2. Unique Genome Designer</h2>
 
-
 
-
<p>1) Shows the information of essential genes that can exist
 
-
in each section.(Function, Product, COG number etc.)</p>
 
-
<p>2) Shows the frequency of the essential genes in each
 
-
section that are analyzed using 82 different species.</p>
 
-
<p>3) Shows the frequency of the COG of each section that is
 
-
analyzed using 82 different species.</p>
 
-
<p>4) By choosing each section of 20 in the screen, the
 
-
essential genes that can be inserted in the relevant area will be
 
-
listed.</p>
 
-
<p>5) The user can put the wanted essential gene In each
 
-
section and see the processing situation. When the whole 478
 
-
essential genes are put in all sections, the design is completed.
 
-
</p>
 
-
<p>7) The user can save and import the designing situation as a
 
-
XML form at their convenience.</p>
 
-
<p>8) When the user complete the design, the designed minimal
 
-
genome can be saved as a XML form and it could be viewed at
 
-
Viewer.</p>
 
-
<div id="scrolltotop"></div>
 
-
 
-
<h1>3. Future Work</h1>
 
-
<p>
 
-
Our ultimate objective is to make a program that can build a
 
-
complete minimum genome. Well do we know the information that we
 
-
need to realize our goal. First, we analogize essential genes that
 
-
every species is supposed to have. And we are aware of the fact
 
-
that to achieve this goal, our analysis method needs higher
 
-
reliability. Therefore, we are going to use the same method on a
 
-
different genus (For example, like Mycoplasma, Salmonella, that
 
-
the essential gene is already identified experimentally), and
 
-
secure the reliability of our analysis method. Secondly, we
 
-
presume specific genes which reflect the characteristic of genome.
 
-
We think that there are genes that the function of it can
 
-
represent each genome or each level. In this year we only analyzed
 
-
every genome in Streptococcus, so we cannot provide information of
 
-
specific genes. But we expect that the specific genes that we
 
-
analyzed in this year not only make our software more wealthy but
 
-
also make accurate result. Thirdly, we presume accurate
 
-
arrangement of essential gene that we analyzed. Because we mostly
 
-
concentrated on the analysis of strand pattern, it isn’t enough to
 
-
provide accurate arrangement. However we know that we need the
 
-
information of the order of genes to make complete design
 
-
software. Thus we are going to analyze this and improve the
 
-
completeness of our Designer.<br> We believe that if this
 
-
analysis becomes perfect, our analysis method will secure
 
-
reliability and give you the accurate outcome. Also hoping that
 
-
you will use our program more actively, we are planning to provide
 
-
information you will need(For example, restriction enzyme sites,
 
-
Gene map, the culture conditions of different species). We hope
 
-
our program Minimal Genome Designer will provide the fundamental
 
-
information for your experiment in the near future.
 
-
</p>
 
-
<div id="scrolltotop"></div>
 
</div>
</div>
</div>
</div>

Revision as of 19:36, 26 September 2012

Minimal Genome Designer

- Analysis

1. Introduction

1-1. Suggestion

Since the Genome project started in 2002, we can easily get the genetic information of many species. Also as the scientific technique developed, we can insert and compose the genome. If we can design a whole genome, then we will be able to make a one and only useful genome. But as today, the compose of the minimum genome made with the essential gene has succeeded, but did not last.

1-2. Object

To design a genome, we have to analyze the pattern of the genome and the distribution of the gene.

1-3. Method and the range of the study

The study was used information of species in streptococcus by patric database ( http://www.patricbrc.org/portal/portal/patric/Home) and SynbUID.
The Data was built by mysql 5.5.27, and a statistical analysis program was used by SAS 9.3.

2. Design

2-1. Prepare

1) Build database

An attribute of Genome name is consisted of ID, Genome_name, COG, Start, End, Strand, and Size.
An attribute of Annotation Table_EG is consisted of ID, locus, and SynbUID. Two entities are paired of Locus_tag 1 by 1.

2) Represented sample number

For checking the number of specimen that is representative, we used a simple random sampling method, and assumed that the complete genome is random. We used the significance level (a=0.05) and the limit of error (b=0.1). The total species of streptococcus is 494 species, and between these, 82 species are completed. According to our calculation, when there is 81 species, the result is satisfied. Therefore, as a result, 82 complete species represent the streptococcus.

3) Standard

3-1) Divided the interval of the genome

The number and size of the genome differs between species. To supplement this problem, we divided the genes in a section to show the genome’s size as a proportion. As a result, when we divided the analyzing section less then a hundred, it was hard to see the patterns because the data has been diluted. And when we divided it into more then a hundred pieces, it was not that different from the result that divided it into a hundred pieces. So we decided to divide it into a hundred pieces.

3-2) Identified the starting point

The number one ORF of each gene sequence analysis data is different between every species. Thus we had to make a specific standard to equalize the beginning of the data. We checked the strand pattern of each genome and identified it with the strands.