We can accommodate libraries of over a million members through our MiCodes design scheme. With a data set this large, we needed to develop methods to make MiCodes a high-throughput screening technique. Because this library utilizes visual phenotypes, two aspects of microscopy needed to be automatable: image acquisition and image processing.
If a MiCode library size is very large, it is important to find a time-efficient method of taking many pictures of yeast cells. This summer we dealt with libraries on the order of 10^3 members, and our images were taken with a regular fluorescence microscope. Larger libraries would most likely make use of automated stages to speed up image acquisition.
The segmentation of cells from an image allows for cell-by-cell analysis in downstream steps. In order to perform such analyses, we wrote cell segmentation software using MATLAB to recognize an individual cell from its background. We first used edge detection with the Sobel operator and several filtering options to approximate possible cell outlines in the image. Then we performed a series of dilation and erosion steps to clear background pixels. We also added an additional filtering routine to refine the overall cell segmentation algorithm. This refining algorithm uses several geometric criteria, which we collected through testing of approximately 400 sample images. Our cell identification (cellID) file can be downloaded here.
On the left we see the original image that requires cell segmentation. On the right we have the processed image where each cell has
been identified as a separate object.
The cellID program is currently optimized for identifying yeast cells and works best for images in which cell clumping is at a minimum. We avoided cell clumping by experimentally altering the cell density of the samples we imaged. Overall, we have seen a successful identification rate of 96 percent for images in which there is minimal cell clumping. In addition, we have tested a set of 300 images in which cell clumping was present. Our results showed a 90 percent successful identification rate of cells. Images of each individual cell were generated, as shown below, and saved at the end of this process for use with downstream analysis. These saved images have several uses. For example, the cell images can be converted into a binary mask to sort out the organelles of a cell for several fluorescent channels. The number of images generated can also be used to count the number of cells identified from a certain population.
An essential portion of automation is determining morphological features. In the context of organelle detection, we developed pipelines to collect geometric measurements characteristic of each organelle (actin, plasma membrane, vacuolar membrane, and nucleus), and defined specific measurements and cutoff values as our feature set for each organelle.
To achieve this, we imaged a small sample size of over 250 cells with varying phenotypes, and determined feature sets associated with each of the four organelles. We used CellProfiler 2.0, an open-source software designed for quantitative analysis of cellular phenotypes. We used this software to build pipelines that demonstrate the feasibility of automated identification of cellular phenotypes. The individual organelle identification pipelines can be viewed here, in .cp format (for CellProfiler 2.0).
In step (2), the thresholding algorithm we used was a Robust Background Adaptive operation, which is a threshold technique available in CellProfiler. We found that this method most effective in high background images, which is often the case when analyzing subcellular phenotypes over cytosolic background.
Establishing geometric features:
In order to train the computer to recognize blobs as organelles, we needed to develop a comprehensive set of criteria that would indicate the identity of a blob. We executed our pipeline procedure on a nominal sample of positive control images (only one type of organelle present per cell). We used these measurements to build feature sets associated with each organelle. The measurements we gathered are displayed below:
These measurements provided us with a quantitative indication of unique features for each organelle. For example, the majority of identified nuclei have a smaller area relative to that of vacuolar membrane and plasma membrane, but larger than that of actin. We used these quantitative differences to establish parameters to separate the identities of organelles in any given image. In the development of our MiCodes automation, we noticed that setting absolute cutoff values provided a reasonably robust characterization of organelles. More information about the methods can be seen in our pipelines.
Having established these geometric parameters, we then proceeded to process a small sample size of 250 cells, with varying MiCodes. We performed a Nucleus identification, to see if our pipelines produced results that closely match those of a biologists' manual detection. The results are shown below:
Results: As seen in the green channel, CellProfiler idenfies nuclei to within approximately 90% of a biologist's accuracy, which demonstrates the feasibility of using automated techniques to quantitatively analyze geometric qualities and process MiCodes. In order to minimize our false negative rate, as seen in the green channel, we have chosen to accept a higher false positive rate by establishing less stringent parameters in thresholding.
To better refine the organelle detection, we also intend to analyze the following additional properties to improve the classification:
|Solidity||Area/Convex Area, or how "solid" a blob is.|
|Eccentricity||Ratio of the distance between foci of an ellipse and its major axis length.|
Once again, by utilizing the geometric differences among organelles, our software allows for automated MiCode identification:
Remarks: At this stage, our pipelines are primarily designed to differentiate between fluorescent membranes and punctate structures, while taking measurements. In the future, with a large, already-constructed dataset (a complete MiCode library), we would likely employ machine learning techniques to improve the automated identification. Our geometric data can potentially be implemented as a training set from which a computer can develop rules for a ranking system of geometric properties.