Team:Johns Hopkins-Software/BiobrickAnalysis
From 2012.igem.org
(Difference between revisions)
(3 intermediate revisions not shown) | |||
Line 7: | Line 7: | ||
</div> | </div> | ||
- | + | ||
When creating AutoPlasmid, we recognized the need for a streamline, automated method of annotating sequences in the field of synthetic biology, since hand-annotated sequences are likely to have errors, and even with many eyes to oversee the annotating process, it may still be difficult to spot errors in annotations. | When creating AutoPlasmid, we recognized the need for a streamline, automated method of annotating sequences in the field of synthetic biology, since hand-annotated sequences are likely to have errors, and even with many eyes to oversee the annotating process, it may still be difficult to spot errors in annotations. | ||
The registry of standard parts currently has over 20,000 parts; roughly 7,000 of them are categorized as available, 11,000 are planning, 1,500 have been deleted, and the rest are either categorized as unavailable, missing, or informational. New biobrick parts are characterized every year, and are hand-annotated, which often lead to errors in characterizing the sequence. To this end, we used AutoPlasmid’s annotation capabilities to check over the annotations made in all of the <a href="http://partsregistry.org/Registry_API"> biobricks</a> in the Registry of Standard Parts (as of September 1, 2012). | The registry of standard parts currently has over 20,000 parts; roughly 7,000 of them are categorized as available, 11,000 are planning, 1,500 have been deleted, and the rest are either categorized as unavailable, missing, or informational. New biobrick parts are characterized every year, and are hand-annotated, which often lead to errors in characterizing the sequence. To this end, we used AutoPlasmid’s annotation capabilities to check over the annotations made in all of the <a href="http://partsregistry.org/Registry_API"> biobricks</a> in the Registry of Standard Parts (as of September 1, 2012). | ||
Line 16: | Line 16: | ||
Methodology | Methodology | ||
</div> | </div> | ||
- | + | ||
<img src="https://static.igem.org/mediawiki/2012/f/f5/Screen_Shot_2012-10-02_at_11.34.50_PM.png" width=500 style="float:right;margin:10px 10px 10px 10px;"/> | <img src="https://static.igem.org/mediawiki/2012/f/f5/Screen_Shot_2012-10-02_at_11.34.50_PM.png" width=500 style="float:right;margin:10px 10px 10px 10px;"/> | ||
We read through each <a href="http://partsregistry.org/Registry_API">xml file</a> that provided the data for each biobrick part and converted them in a format accepted by AutoPlasmid, and cross-checked the annotations provided from the xml files with the annotations provided by AutoPlasmid. In this test, we did perfect alignments instead of imperfect. Any biobricks that had a notable mistake in their annotations were flagged, and the mistake was recorded. Other parameters of the biobricks, such as the status (i.e. if available, planning, deleted, etc.), were also taken into account. | We read through each <a href="http://partsregistry.org/Registry_API">xml file</a> that provided the data for each biobrick part and converted them in a format accepted by AutoPlasmid, and cross-checked the annotations provided from the xml files with the annotations provided by AutoPlasmid. In this test, we did perfect alignments instead of imperfect. Any biobricks that had a notable mistake in their annotations were flagged, and the mistake was recorded. Other parameters of the biobricks, such as the status (i.e. if available, planning, deleted, etc.), were also taken into account. | ||
Line 23: | Line 23: | ||
Results | Results | ||
</div> | </div> | ||
- | + | ||
We noticed two very common errors with the biobrick annotations from the xml data, one being incorrectly defining the strand on which the sequence was on, i.e. the annotation stated it was on the reverse strand, whereas it was on the forward strand and vice versa (Wrong Strand). The other was the annotation sequence didn’t match the correct sequence (Mismatch). Others, which were surprising, included having annotations that were not on the biobrick part’s sequence (Out of Bounds) and biobrick parts that were less than 3 base pairs long (Empty). </p><br> | We noticed two very common errors with the biobrick annotations from the xml data, one being incorrectly defining the strand on which the sequence was on, i.e. the annotation stated it was on the reverse strand, whereas it was on the forward strand and vice versa (Wrong Strand). The other was the annotation sequence didn’t match the correct sequence (Mismatch). Others, which were surprising, included having annotations that were not on the biobrick part’s sequence (Out of Bounds) and biobrick parts that were less than 3 base pairs long (Empty). </p><br> | ||
Line 29: | Line 29: | ||
<img src="https://static.igem.org/mediawiki/2012/4/46/ErrorAnalysis.png" width=380 /> <img src="https://static.igem.org/mediawiki/2012/0/0b/Annotation_Errors.png" width=380 /></center> | <img src="https://static.igem.org/mediawiki/2012/4/46/ErrorAnalysis.png" width=380 /> <img src="https://static.igem.org/mediawiki/2012/0/0b/Annotation_Errors.png" width=380 /></center> | ||
<br> | <br> | ||
- | + | <br><br> | |
<div id="title"> | <div id="title"> | ||
Conclusion | Conclusion | ||
</div> | </div> | ||
- | + | ||
What we did was a very quick scan of the biobrick parts, since we checked only perfect alignments and didn’t take into account potential mutations in annotation sequences that could still produce the same result. Although there may be slight discrepancies in what is truly an incorrect annotation sequence, what we have done is isolated the parts that may have annotation errors and will need to be checked over, which is something that would have taken several hours if a single person were doing it. We see this as a reason for synthetic biologists to use software to help them annotate their constructed sequences, as opposed to hand-annotating, since the accuracy of computer-generated annotations from simple alignment algorithms would be much greater and reduce the amount of errors that we see currently in the Parts Registry. Given that the Parts Registry is constantly increasing in size, and more and more complicated constructs will be created in the future as synthetic biology advances, we see that using software to annotate will help to mitigate future errors and man hours invested into correcting incorrect annotation sequences. | What we did was a very quick scan of the biobrick parts, since we checked only perfect alignments and didn’t take into account potential mutations in annotation sequences that could still produce the same result. Although there may be slight discrepancies in what is truly an incorrect annotation sequence, what we have done is isolated the parts that may have annotation errors and will need to be checked over, which is something that would have taken several hours if a single person were doing it. We see this as a reason for synthetic biologists to use software to help them annotate their constructed sequences, as opposed to hand-annotating, since the accuracy of computer-generated annotations from simple alignment algorithms would be much greater and reduce the amount of errors that we see currently in the Parts Registry. Given that the Parts Registry is constantly increasing in size, and more and more complicated constructs will be created in the future as synthetic biology advances, we see that using software to annotate will help to mitigate future errors and man hours invested into correcting incorrect annotation sequences. | ||
<br><br> | <br><br> | ||
Line 39: | Line 39: | ||
A zip file containing all of the biobrick XML files that we used can be found <a href="http://ugrad.cs.jhu.edu/~eisinger/iGEM/Biobrick_XML_files.zip">here</a>. | A zip file containing all of the biobrick XML files that we used can be found <a href="http://ugrad.cs.jhu.edu/~eisinger/iGEM/Biobrick_XML_files.zip">here</a>. | ||
- | |||
- | </div | + | </div><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br> |
</html> | </html> | ||
{{:Team:Johns_Hopkins-Software/header}} | {{:Team:Johns_Hopkins-Software/header}} |