A cloud-based tool for plasmid annotation and design
What is cloud computing? Many of you may already be very familiar with the concept of the cloud. It is the use of software and hardware services across a network, often the internet. It can be utilized in many forms as we are all familiar with Google apps, web hosting services, Dropbox, etc. The advantages of using the cloud is that a company would not have to maintain their own hardware, so they can save on the cost of the technology while ensuring the quality of performances. They can increase access as essentially anyone with the authorized credentials could access the data or software through the internet, and are not limited to any physical location. And of course, the cloud can handle many computationally demanding tasks. Using multiple machines to process work in parallel, performance could be sped up to a small fraction of the time.
In the case of the AutoGene alignment algorithm, we wrote a client script that communicates with the cloud backend, which runs two tiers of algorithms that splits up the job into many subjobs running in parallel. We have tested this on an alignment of the PUC18 gene, which consists of a sequence of 2,680 letters, against a library of 17,500 yeast features, each about 400 letters long. Running conventionally without the cloud0, we found that it would take about 39 minutes to complete this alignment. Running the algorithm tailored for the cloud on a local work manager took around 10 minutes. The cloud from a cold start, meaning when we are just turning on the machines and they are not yet running at full power, it took 18 minutes. With five processing units on full power, it took less than seven minutes. Then finally with 10 processors we cut the time to three minutes, performing more than thirteen times faster than without the cloud. What a difference, right? This isn’t just speeding up the amount of time it takes to run a program. PUC18 is a relatively unintimidating-sized sequence. Many sequences of interest can get to many thousands of letters in length and libraries can have countless features, which could cause alignments to take weeks to complete. This would require more memory than a local machine would be able to handle, so this is the kind of job that could only be done through a cloud server. So far we have only been testing with 10 worker units. Theoretically, if we were to use more, we would continue to see a drastic change in speed. With this kind of improvement, we are making the impossible in biology possible.