Team:Berkeley/Project/Automation

From 2012.igem.org

(Difference between revisions)
 
(178 intermediate revisions not shown)
Line 129: Line 129:
<p>
<p>
-
The cellID program is currently optimized for identifying yeast cells and works best for images in which cell clumping is at a minimum. We avoided cell clumping by experimentally altering the cell density of the samples we imaged. Overall, we have seen a successful identification rate of 96 percent for images we have tested. Images of each individual cell were generated, as shown below, and saved at the end of this process for use with downstream analysis. These saved images have several uses. For example, the cell images can be converted into a binary mask to sort out the organelles of a cell for several fluorescent channels. The number of images generated can also be used to count the number of cells identified from a certain population.
+
The cellID program is currently optimized for identifying yeast cells and works best for images in which cell clumping is at a minimum. We avoided cell clumping by experimentally altering the cell density of the samples we imaged. Overall, we have seen a successful identification rate of 96 percent for images in which there is minimal cell clumping. In addition, we have tested a set of 300 images in which cell clumping was present. Our results showed a 90 percent successful identification rate of cells. Images of each individual cell were generated, as shown below, and saved at the end of this process for use with downstream analysis. These saved images have several uses. For example, the cell images can be converted into a binary mask to sort out the organelles of a cell for several fluorescent channels. The number of images generated can also be used to count the number of cells identified from a certain population.
</p>
</p>
</div>
</div>
Line 154: Line 154:
     <img src="https://static.igem.org/mediawiki/2012/1/14/Project3_subheader4.jpg" width="980">
     <img src="https://static.igem.org/mediawiki/2012/1/14/Project3_subheader4.jpg" width="980">
</div>
</div>
 +
<div class = "row-end"></div>
<div class="row">
<div class="row">
-
<div class="col1" align="justify">
+
<div class="col12-2" align="justify">
<p>
<p>
An essential portion of automation is determining morphological features. In the context of organelle detection, we developed pipelines to collect geometric measurements characteristic of each organelle (actin, plasma membrane, vacuolar membrane, and nucleus), and defined specific measurements and cutoff values as our feature set for each organelle.  
An essential portion of automation is determining morphological features. In the context of organelle detection, we developed pipelines to collect geometric measurements characteristic of each organelle (actin, plasma membrane, vacuolar membrane, and nucleus), and defined specific measurements and cutoff values as our feature set for each organelle.  
Line 162: Line 163:
<br>
<br>
<p>
<p>
-
To achieve this, we imaged a small sample size of about 100 cells with varying phenotypes, and determined feature sets associated with each of the four organelles. We used <a href="http://www.cellprofiler.org/">CellProfiler 2.0</a>, an open-source software designed for quantitative analysis of cellular phenotypes. We used this software to build pipelines that demonstrate the feasibility of automated identification of cellular phenotypes. The nucleus and actin MiCodes identification pipelines can be viewed <a href="https://static.igem.org/mediawiki/2012/8/8a/Organelles.zip">here,</a> in .cp format (for CellProfiler 2.0).
+
To achieve this, we imaged a small sample size of over 250 cells with varying phenotypes, and determined feature sets associated with each of the four organelles. We used <a href="http://www.cellprofiler.org/">CellProfiler 2.0</a>, an open-source software designed for quantitative analysis of cellular phenotypes. We used this software to build pipelines that demonstrate the feasibility of automated identification of cellular phenotypes. The individual organelle identification pipelines can be viewed <a href="https://static.igem.org/mediawiki/2012/8/8a/Organelles.zip">here,</a> in .cp format (for CellProfiler 2.0).
<br>
<br>
<br>
<br>
Line 169: Line 170:
<p>
<p>
<b>Pipeline procedure:</b>  
<b>Pipeline procedure:</b>  
-
<br>
 
-
1. Separate cell image into its constituent fluorescent channels. Convert cells to grayscale.
 
-
<br>
 
<br>
<br>
Line 177: Line 175:
<div class="row">
<div class="row">
<div class="col1" align="center">
<div class="col1" align="center">
-
<img src="https://static.igem.org/mediawiki/2012/3/36/Threshold.jpg" width="450" hspace = "225"> <br><br>
+
<img src="https://static.igem.org/mediawiki/2012/3/36/Threshold.jpg" width="875" hspace = "0"> <br><br>
</div>
</div>
-
<div class="row-end"></div>
 
-
<br> 2. Circle and identify blobs of high intensity pixels. These become unspecific "objects", that are converted to a psuedocolor image. <br><br>
 
 +
<p>
 +
<div class="row" align="center">
 +
<div class="col2-2" style = "margin-left: 150px";>
 +
 +
1) Acquire single-celled images of each channel.
 +
<br>
 +
<br> </div>
 +
 +
<div class="col2-2">
 +
2) Convert cells to grayscale, and  apply a thresholding algorithm, removing background pixels and retaining foreground.
 +
</div>
 +
 +
<div class="col2-2">
 +
3) Determine distinct blobs of high intensity pixels. These established as undefined "objects".</div>
 +
 +
</div>
 +
 +
<div class = "col2-2">
 +
4) Take geometric and intensity measurements for each distinct "object".
 +
</div>
<div class="row">
<div class="row">
-
<div class="col1" align="center">
+
<div class="col4" id="spacer4"></div>
 +
<div class="row-end"> </div>
 +
<div class="col4" id="spacer4"></div>
 +
<div class="row-end"> </div>
 +
</div>
 +
<div class = "row">
 +
<div class = "col8-3" style = "margin-left:150px">
 +
<p = align:"center">
 +
In step (2), the thresholding algorithm we used was a <a href="http://www.cellprofiler.org/CPmanual/ApplyThreshold.html">Robust Background Adaptive </a> operation, which is a threshold technique available in CellProfiler. We found that this method most effective in high background images, which is often the case when analyzing subcellular phenotypes over cytosolic background. <br><br>
-
<img src="https://static.igem.org/mediawiki/2012/5/5a/PixelClusters.jpg" width="450" hspace = "225"><br>
 
-
<div class = "col15">
 
-
<p align: "left">To identify "interesting" pixels, we performed a <a href="http://www.cellprofiler.org/CPmanual/ApplyThreshold.html">Robust Background Adaptive </a> threshold operation, which is a threshold technique available in CellProfiler. We found that this method is most effective in high background images, which is often the case when analyzing subcellular phenotypes over cytosolic background.</p> </div><br>
 
</div>
</div>
-
<div class="row-end"></div>
 
-
3. Determine the features associated with each "object." We took measurements of geometric properties of a sample size of about 100 cells, and selected our geometric properties through inspection of these measurements. We came up with the following feature types:
 
 +
<div class = "row-end"></div><br><br><br>
 +
 +
<b>Establishing geometric features:</b> <br><br>
 +
 +
<div class = "row">
 +
<div class = "col1" align = "justify">
<p>
<p>
-
<table align="center">
+
In order to train the computer to recognize blobs as organelles, we needed to develop a comprehensive set of criteria that would indicate the identity of a blob. We executed our pipeline procedure on a nominal sample of positive control images (only one type of organelle present per cell). We used these measurements to build feature sets associated with each organelle. The measurements we gathered are displayed below: </p><br> <br>
-
  <tr>
+
 
-
    <th>Geometric Property</th>
+
</div>
-
    <th>Description</th>
+
 
-
  </tr>
+
<div class = "row-end"></div></div>
-
  <tr>
+
 
-
    <td>Area</td>
+
<div class = "row">
-
    <td>The area occupied by a blob of pixels.</td>
+
<div class = "col6">
-
  </tr>
+
<img src="https://static.igem.org/mediawiki/2012/4/4f/Compactness.jpg" width="400" style = "margin-left: 20px">
-
  <tr>
+
</div>
-
    <td>Perimeter</td>
+
 
-
    <td>The total number of pixels around the boundary of a blob.</td>
+
<div class = "col4">
-
  </tr>
+
<img src="https://static.igem.org/mediawiki/2012/4/47/FormFactor.jpg" width="400" >
-
  <tr>
+
</div>
-
    <td>Form Factor</td>
+
 
-
    <td>How circular an object is. Calculated as (4*pi*area)/(perimeter)^2.</td>
+
<div class = "row-end"></div>
-
  </tr>
+
</div>
-
</table>
+
 
 +
<div class = "row">
 +
<div class = "col6" id = "spacer4">
 +
</div>
 +
<div class = "row-end"> </div></div>
 +
</div>
 +
 
 +
<div class = "row">
 +
<div class = "col6" id = "spacer4">
 +
<img src="https://static.igem.org/mediawiki/2012/3/32/Area.jpg" width="400" style = "margin-left: 20px">
 +
</div>
 +
 
 +
<div class = "col4">
 +
<img src="https://static.igem.org/mediawiki/2012/1/1d/MajorVsMinor.jpg" width="400">
 +
</div>
 +
 
 +
<div class = "row-end"></div>
 +
 
 +
</div>
 +
 
 +
 
 +
<div class = "row">
 +
 
 +
<p>
 +
These measurements provided us with a quantitative indication of unique features for each organelle. For example, the majority of identified nuclei have a smaller area relative to that of vacuolar membrane and plasma membrane, but larger than that of actin. We used these quantitative differences to establish parameters to separate the identities of organelles in any given image. In the development of our MiCodes automation, we noticed that setting absolute cutoff values provided a reasonably robust characterization of organelles. More information about the methods can be seen in our pipelines. <br><br>
 +
Having established these geometric parameters, we then proceeded to process a small sample size of 250 cells, with varying MiCodes. We performed a Nucleus identification, to see if our pipelines produced results that closely match those of a biologists' manual detection. The results are shown below: <br><br>
 +
 
 +
 
 +
<img src="https://static.igem.org/mediawiki/2012/a/ac/NucleusCheck.jpg" width="400" style = "margin-left:300px">
 +
 
 +
<p> <b>Results:</b> As seen in the green channel, CellProfiler idenfies nuclei to within approximately 90% of a biologist's accuracy, which demonstrates the feasibility of using automated techniques to quantitatively analyze geometric qualities and process MiCodes. In order to minimize our false negative rate, as seen in the green channel, we have chosen to accept a higher false positive rate by establishing less stringent parameters in thresholding.
 +
<div class = "row-end"></div></div>
<p align = "center">
<p align = "center">
<br>
<br>
Line 239: Line 295:
</table>
</table>
-
<br>
 
</div>
</div>
-
<div class="row-end"></div>
+
<br>
-
<br> 4. Use the measurements taken to classify each "object" as a particular organelle. <br> <br>
+
 
 +
 
 +
<br> Once again, by utilizing the geometric differences among organelles, our software allows for automated MiCode identification: <br> <br>
<img src="https://static.igem.org/mediawiki/2012/4/4a/Output.png" width="450" hspace = "225"><br><br>
<img src="https://static.igem.org/mediawiki/2012/4/4a/Output.png" width="450" hspace = "225"><br><br>
-
<b>Results:</b> Our software proved capable of successfully identifying MiCoded nuclei 95% of the time when processing our sample image size (100 cells). At this stage, our pipelines are primarily designed to differentiate between fluorescent membranes and punctate structures, while taking measurements.  
+
<b>Remarks</b>: At this stage, our pipelines are primarily designed to differentiate between fluorescent membranes and punctate structures, while taking measurements.  
-
In the future, with a large, already-constructed dataset (a complete MiCode library), we would likely employ machine learning techniques to improve the automated identification.
+
In the future, with a large, already-constructed dataset (a complete MiCode library), we would likely employ machine learning techniques to improve the automated identification. Our geometric data can potentially be implemented as a training set from which a computer can develop rules for a ranking system of geometric properties.
</p>
</p>
</div>
</div>

Latest revision as of 04:03, 27 October 2012

header
iGEM Berkeley iGEMBerkeley iGEMBerkeley

Mercury

We can accommodate libraries of over a million members through our MiCodes design scheme. With a data set this large, we needed to develop methods to make MiCodes a high-throughput screening technique. Because this library utilizes visual phenotypes, two aspects of microscopy needed to be automatable: image acquisition and image processing.


If a MiCode library size is very large, it is important to find a time-efficient method of taking many pictures of yeast cells. This summer we dealt with libraries on the order of 10^3 members, and our images were taken with a regular fluorescence microscope. Larger libraries would most likely make use of automated stages to speed up image acquisition.


The segmentation of cells from an image allows for cell-by-cell analysis in downstream steps. In order to perform such analyses, we wrote cell segmentation software using MATLAB to recognize an individual cell from its background. We first used edge detection with the Sobel operator and several filtering options to approximate possible cell outlines in the image. Then we performed a series of dilation and erosion steps to clear background pixels. We also added an additional filtering routine to refine the overall cell segmentation algorithm. This refining algorithm uses several geometric criteria, which we collected through testing of approximately 400 sample images. Our cell identification (cellID) file can be downloaded here.

On the left we see the original image that requires cell segmentation. On the right we have the processed image where each cell has
been identified as a separate object.


The cellID program is currently optimized for identifying yeast cells and works best for images in which cell clumping is at a minimum. We avoided cell clumping by experimentally altering the cell density of the samples we imaged. Overall, we have seen a successful identification rate of 96 percent for images in which there is minimal cell clumping. In addition, we have tested a set of 300 images in which cell clumping was present. Our results showed a 90 percent successful identification rate of cells. Images of each individual cell were generated, as shown below, and saved at the end of this process for use with downstream analysis. These saved images have several uses. For example, the cell images can be converted into a binary mask to sort out the organelles of a cell for several fluorescent channels. The number of images generated can also be used to count the number of cells identified from a certain population.


An essential portion of automation is determining morphological features. In the context of organelle detection, we developed pipelines to collect geometric measurements characteristic of each organelle (actin, plasma membrane, vacuolar membrane, and nucleus), and defined specific measurements and cutoff values as our feature set for each organelle.

To achieve this, we imaged a small sample size of over 250 cells with varying phenotypes, and determined feature sets associated with each of the four organelles. We used CellProfiler 2.0, an open-source software designed for quantitative analysis of cellular phenotypes. We used this software to build pipelines that demonstrate the feasibility of automated identification of cellular phenotypes. The individual organelle identification pipelines can be viewed here, in .cp format (for CellProfiler 2.0).

Pipeline procedure:



1) Acquire single-celled images of each channel.

2) Convert cells to grayscale, and apply a thresholding algorithm, removing background pixels and retaining foreground.
3) Determine distinct blobs of high intensity pixels. These established as undefined "objects".
4) Take geometric and intensity measurements for each distinct "object".

In step (2), the thresholding algorithm we used was a Robust Background Adaptive operation, which is a threshold technique available in CellProfiler. We found that this method most effective in high background images, which is often the case when analyzing subcellular phenotypes over cytosolic background.




Establishing geometric features:

In order to train the computer to recognize blobs as organelles, we needed to develop a comprehensive set of criteria that would indicate the identity of a blob. We executed our pipeline procedure on a nominal sample of positive control images (only one type of organelle present per cell). We used these measurements to build feature sets associated with each organelle. The measurements we gathered are displayed below:



These measurements provided us with a quantitative indication of unique features for each organelle. For example, the majority of identified nuclei have a smaller area relative to that of vacuolar membrane and plasma membrane, but larger than that of actin. We used these quantitative differences to establish parameters to separate the identities of organelles in any given image. In the development of our MiCodes automation, we noticed that setting absolute cutoff values provided a reasonably robust characterization of organelles. More information about the methods can be seen in our pipelines.

Having established these geometric parameters, we then proceeded to process a small sample size of 250 cells, with varying MiCodes. We performed a Nucleus identification, to see if our pipelines produced results that closely match those of a biologists' manual detection. The results are shown below:

Results: As seen in the green channel, CellProfiler idenfies nuclei to within approximately 90% of a biologist's accuracy, which demonstrates the feasibility of using automated techniques to quantitatively analyze geometric qualities and process MiCodes. In order to minimize our false negative rate, as seen in the green channel, we have chosen to accept a higher false positive rate by establishing less stringent parameters in thresholding.



To better refine the organelle detection, we also intend to analyze the following additional properties to improve the classification:

Geometric Property Description
Solidity Area/Convex Area, or how "solid" a blob is. 
Eccentricity Ratio of the distance between foci of an ellipse and its major axis length. 


Once again, by utilizing the geometric differences among organelles, our software allows for automated MiCode identification:



Remarks: At this stage, our pipelines are primarily designed to differentiate between fluorescent membranes and punctate structures, while taking measurements. In the future, with a large, already-constructed dataset (a complete MiCode library), we would likely employ machine learning techniques to improve the automated identification. Our geometric data can potentially be implemented as a training set from which a computer can develop rules for a ranking system of geometric properties.