Team:Johns Hopkins-Software/BiobrickAnalysis
From 2012.igem.org
Contents |
Biobrick Analysis
Introduction
Overview
When creating AutoPlasmid, we recognized the need for a streamline, automated method of annotating sequences in the field of synthetic biology, since the hand-annotated sequences are likely to have errors, and even with many eyes, it may still be difficult to spot errors in annotations. The registry of standard parts currently has over 20,000 parts; roughly 7,000 of them are categorized as available, 11,000 are planning, 1,500 have been deleted, and the rest are either categorized as unavailable, missing, or informational. New biobrick parts are characterized every year, and are hand-annotated, which often lead to errors in characterizing the sequence. To this end, we used AutoPlasmid’s annotation capabilities to check over the annotations made in all of the [http://partsregistry.org/Registry_API biobricks] in the Registry of Standard Parts (as of September 1, 2012).
Methodology
We read through each xml file that provided the data for each biobrick part and converted them in a format accepted by AutoPlasmid, and cross-checked the annotations provided from the xml files with the annotations provided by AutoPlasmid. In this test, we did perfect alignments instead of imperfect. Any biobricks that had a notable mistake in their annotations were flagged, and the mistake was recorded. Other parameters of the biobricks, such as the status (i.e. if available, planning, deleted, etc.), were also taken into account.
Results
We noticed two very common errors with the biobrick annotations from the xml data, one being incorrectly defining the strand on which the sequence was on, i.e. the annotation stated it was on the reverse strand, whereas it was on the forward strand and vice versa. The other was the annotation sequence didn’t match the correct sequence. Others, which were surprising, included having annotations that were not on the biobrick part’s sequence and biobrick parts that were less than 3 base pairs long.