Team:Johns Hopkins-Software/Cloud
From 2012.igem.org
(Difference between revisions)
Line 17: | Line 17: | ||
We have been working on integrating the sequence alignment function with the cloud. This is the task of taking two genetic sequences and finding the area of best fit between the two. Alignments are often conducted by biologists to scan genes from various organisms for certain features, and study the significance of these traits, and how they may have arose. | We have been working on integrating the sequence alignment function with the cloud. This is the task of taking two genetic sequences and finding the area of best fit between the two. Alignments are often conducted by biologists to scan genes from various organisms for certain features, and study the significance of these traits, and how they may have arose. | ||
<br><br>Though there are a number of ways to obtain an alignment, we opted to utilize dynamic programming with the Smith-Waterman algorithm. This algorithm performs a local alignment, meaning it searches through the two sequences for matches of all sizes and finds the highest similarity using a scoring system of assigning points based on matching letters, mismatched letters, or skipped letters (a.k.a. gaps). Varying the scoring system could also vary the results. Such an algorithm takes mn operations given two sequences with lengths m and n, so the worst case scenario would involve a complexity of m squared, implying that this task becomes exponentially more time consuming as our sequences get longer. | <br><br>Though there are a number of ways to obtain an alignment, we opted to utilize dynamic programming with the Smith-Waterman algorithm. This algorithm performs a local alignment, meaning it searches through the two sequences for matches of all sizes and finds the highest similarity using a scoring system of assigning points based on matching letters, mismatched letters, or skipped letters (a.k.a. gaps). Varying the scoring system could also vary the results. Such an algorithm takes mn operations given two sequences with lengths m and n, so the worst case scenario would involve a complexity of m squared, implying that this task becomes exponentially more time consuming as our sequences get longer. | ||
- | <br> | + | <br><br> |
<a href="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"><img style="display: block; margin-left: auto; margin-right: auto; width: 720px; padding:8px;" src="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"></img></a> | <a href="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"><img style="display: block; margin-left: auto; margin-right: auto; width: 720px; padding:8px;" src="https://static.igem.org/mediawiki/2012/f/f6/Waterman_Scoring.png"></img></a> | ||
- | |||
<br><br> | <br><br> | ||
The manual process of this algorithm involves setting each of the two sequences on an axis of a grid. Each box | The manual process of this algorithm involves setting each of the two sequences on an axis of a grid. Each box |