Team:Johns Hopkins-Software/Cloud

Revision as of 08:07, 3 October 2012 (view source)

Latest revision as of 04:00, 4 October 2012 (view source)

(20 intermediate revisions not shown)

Line 17:

We have been working on integrating the sequence alignment function with the cloud. This is the task of taking two genetic sequences and finding the area of best fit between the two. Alignments are often conducted by biologists to scan genes from various organisms for certain features, and study the significance of these traits, and how they may have arose.

Though there are a number of ways to obtain an alignment, we opted to utilize dynamic programming with the Smith-Waterman algorithm. This algorithm performs a local alignment, meaning it searches through the two sequences for matches of all sizes and finds the highest similarity using a scoring system of assigning points based on matching letters, mismatched letters, or skipped letters (a.k.a. gaps). Varying the scoring system could also vary the results. Such an algorithm takes mn operations given two sequences with lengths m and n, so the worst case scenario would involve a complexity of m squared, implying that this task becomes exponentially more time consuming as our sequences get longer.

-

+

-

The manual process of this algorithm involves setting each of the two sequences on an axis of a grid. Each box

Line 31:

Line 30:

</div>

-

In the ~~case of~~ the ~~Autogene alignment algorithm~~, we wrote ~~a client script that communicates with~~ the cloud ~~backend, running~~ two ~~tiers of algorithms that~~ splits up the ~~job into many~~ subjobs ~~running~~ in parallel.

+

We collaborated with Autodesk and implemented our cloud algorithm through their Project Saturn API and Autodesk Cloud services. Saturn is a new framework designed to provide customers with single- and multi-objective global optimization-driven algorithms. It features the capability to be fully integrated in engineering products as a multi-language and multi-platform optimization library, and to communicate with the framework running on the Autodesk Cloud, thereby actualizing the possibility of seamlessly and efficiently integrating custom solutions and scalable systems able to carry out any optimization taks demanded by users.

+

+

+

In our project, we wrote both frontend and backend components to carry out the Smith-Waterman local alignments on the cloud at rapid speeds on demand by utilizing a two-tier algorithm which splits up the tasks and runs the subjobs in parallel. Users are able to upload a plasmid sequence and specify a (n) number of subjobs in which to split the alignment process. The plasmid sequence is then temporarily stored as a resource in the cloud, along with an existing table of features previously and permanently uploaded to cut down on file transfer time. In the first tier of the algorithm, the stored features table is accessed and split into the specified (n) number of subjobs and initiates (n) subjobs to run in the second tier. This requires the activation of (n) machines on the cloud to most efficiently execute the alignment process. At each branch of the second tier algorithm, we utilize the EMBOSS Water tool to run the Smith-Waterman algorithm against the plasmid sequence and the designated set of features. A unique result resource is created for each subjob, and alignment results are filtered through a threshold of 98% identity before being appended to this file. The results are then reformated into a JSON array before being passed to the first tier. With the completion of each subjob, the first tier of the algorithm sends back partial results to the client, which then appends each of these JSON arrays together and returns a final result with all the completed alignments together.

+

Line 41:

Line 44:

</div>

-

We have tested this on an alignment of the PUC18 ~~gene~~, which consists of a sequence of 2,680 letters, against a library of 17,498 yeast features, each about 400 base-pairs long. Running conventionally without the cloud, we found that it takes a local machine 39 minutes to complete this alignment. ~~Running~~ it on the cloud with 10 processors we cut the time to three minutes, and running it with 30 processors we cut it to nearly one minute. PUC18 is a relatively unintimidating-sized sequence. Considering how many sequences of interest can be up to thousands of letters in length, and how libraries can have countless features, which could cause alignments to take weeks to complete, certain alignment tasks would require more memory than a local machine would be able to handle, so this is the kind of job that could only be done through a cloud server. With this kind of improvement, we are making the impossible in biology possible.

+

We have tested this on an alignment of the PUC18 plasmid, which consists of a sequence of 2,680 letters, against a library of 17,498 yeast features, each about 400 base-pairs long. Running conventionally without the cloud, we found that it takes a local machine 39 minutes to complete this alignment. Implementing the cloud in over 80 timed executions with very little standard error values, we found that running it on the cloud with 10 processors cut the time to three minutes, and running it with 30 processors cut it to nearly one minute. PUC18 is a relatively unintimidating-sized sequence. Considering how many sequences of interest can be up to thousands of letters in length, and how libraries can have countless features, which could cause alignments to take weeks to complete, certain alignment tasks would require more memory than a local machine would be able to handle, so this is the kind of job that could only be done through a cloud server. With this kind of improvement, we are making the impossible in biology possible.

-

+

-

+

</div>

-

+

-

+

</html>

@@ Line 17: / Line 17: @@
 We have been working on integrating the sequence alignment function with the cloud. This is the task of taking two genetic sequences and finding the area of best fit between the two. Alignments are often conducted by biologists to scan genes from various organisms for certain features, and study the significance of these traits, and how they may have arose.
 <br><br>Though there are a number of ways to obtain an alignment, we opted to utilize dynamic programming with the Smith-Waterman algorithm. This algorithm performs a local alignment, meaning it searches through the two sequences for matches of all sizes and finds the highest similarity using a scoring system of assigning points based on matching letters, mismatched letters, or skipped letters (a.k.a. gaps). Varying the scoring system could also vary the results. Such an algorithm takes mn operations given two sequences with lengths m and n, so the worst case scenario would involve a complexity of m squared, implying that this task becomes exponentially more time consuming as our sequences get longer.
-<br>
+<br><br>
 <a href="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"><img style="display: block; margin-left: auto; margin-right: auto; width: 720px; padding:8px;" src="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"></img></a>
 <br><br>
 The manual process of this algorithm involves setting each of the two sequences on an axis of a grid. Each box
@@ Line 31: / Line 30: @@
 </div>
 <br>
-In the case of the Autogene alignment algorithm, we wrote a client script that communicates with the cloud backend, running two tiers of algorithms that splits up the job into many subjobs running in parallel.
+We collaborated with Autodesk and implemented our cloud algorithm through their Project Saturn API and Autodesk Cloud services. Saturn is a new framework designed to provide customers with single- and multi-objective global optimization-driven algorithms. It features the capability to be fully integrated in engineering products as a multi-language and multi-platform optimization library, and to communicate with the framework running on the Autodesk Cloud, thereby actualizing the possibility of seamlessly and efficiently integrating custom solutions and scalable systems able to carry out any optimization taks demanded by users.
+<br>
+<br>
+In our project, we wrote both frontend and backend components to carry out the Smith-Waterman local alignments on the cloud at rapid speeds on demand by utilizing a two-tier algorithm which splits up the tasks and runs the subjobs in parallel. Users are able to upload a plasmid sequence and specify a (n) number of subjobs in which to split the alignment process. The plasmid sequence is then temporarily stored as a resource in the cloud, along with an existing table of features previously and permanently uploaded to cut down on file transfer time. In the first tier of the algorithm, the stored features table is accessed and split into the specified (n) number of subjobs and initiates (n) subjobs to run in the second tier. This requires the activation of (n) machines on the cloud to most efficiently execute the alignment process. At each branch of the second tier algorithm, we utilize the EMBOSS Water tool to run the Smith-Waterman algorithm against the plasmid sequence and the designated set of features. A unique result resource is created for each subjob, and alignment results are filtered through a threshold of 98% identity before being appended to this file. The results are then reformated into a JSON array before being passed to the first tier. With the completion of each subjob, the first tier of the algorithm sends back partial results to the client, which then appends each of these JSON arrays together and returns a final result with all the completed alignments together.
+<br>
 <br>
@@ Line 41: / Line 44: @@
 </div>
 <br>
-We have tested this on an alignment of the PUC18 gene, which consists of a sequence of 2,680 letters, against a library of 17,498 yeast features, each about 400 base-pairs long. Running conventionally without the cloud, we found that it takes a local machine 39 minutes to complete this alignment. Running it on the cloud with 10 processors we cut the time to three minutes, and running it with 30 processors we cut it to nearly one minute. PUC18 is a relatively unintimidating-sized sequence. Considering how many sequences of interest can be up to thousands of letters in length, and how libraries can have countless features, which could cause alignments to take weeks to complete, certain alignment tasks would require more memory than a local machine would be able to handle, so this is the kind of job that could only be done through a cloud server. With this kind of improvement, we are making the impossible in biology possible.
+We have tested this on an alignment of the PUC18 plasmid, which consists of a sequence of 2,680 letters, against a library of 17,498 yeast features, each about 400 base-pairs long. Running conventionally without the cloud, we found that it takes a local machine 39 minutes to complete this alignment. Implementing the cloud in over 80 timed executions with very little standard error values, we found that running it on the cloud with 10 processors cut the time to three minutes, and running it with 30 processors cut it to nearly one minute. PUC18 is a relatively unintimidating-sized sequence. Considering how many sequences of interest can be up to thousands of letters in length, and how libraries can have countless features, which could cause alignments to take weeks to complete, certain alignment tasks would require more memory than a local machine would be able to handle, so this is the kind of job that could only be done through a cloud server. With this kind of improvement, we are making the impossible in biology possible.
-<br>
+<br><br>
-<a href="https://static.igem.org/mediawiki/2012/a/ab/CloudPerformance.png"><img style="display: block; margin-left: auto; margin-right: auto; width: 720px; padding:8px;" src="https://static.igem.org/mediawiki/2012/a/ab/CloudPerformance.png"></img></a>
+<a href="https://static.igem.org/mediawiki/2012/a/ab/CloudPerformance.png"><img style="display: block; margin-left: auto; margin-right: auto; width: 720px; padding:8px;" src="https://static.igem.org/mediawiki/2012/a/ab/CloudPerformance.png"></img></a><br>
 </div>
-<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
+<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
 </html>
 {{:Team:Johns_Hopkins-Software/header}}

Team:Johns Hopkins-Software/Cloud

From 2012.igem.org

Latest revision as of 04:00, 4 October 2012