Creating a Custom CellProfiler Docker Image

Mar 26, 2019

Overview

If you've ever worked with scientific software you will know that installing them is not necessarily straightforward. I think this is changing quite a bit with tools like conda and docker, but sometimes we need to just sit down and debug an installation. There is hope here, because if you can get it working just once, and put it in a docker container, you don't ever have to worry about getting it working on another server! 

CellProfiler

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically. More information can be found in the CellProfiler Wiki
CellProfiler GitHub

Cool stuff. CellProfiler is an industry standard, and beyond useful for many scientists. It's totally worth packaging up into a docker container, and there is in fact one available from the wonderful people at CellProfiler. I needed to make just a few tweaks to get it working for a particular use case.

I put this docker image together with help from the CellProfiler Conda Installation Docs and the Official CellProfiler Docker Image.

The DockerFile

The DockerFile contains a set of build instructions. Here we inherit from the base anaconda image, as CellProfiler is mostly a python application.

  FROM continuumio/miniconda3:4.5.11
   
  RUN apt-get update -y; apt-get upgrade -y
  RUN apt-get install -y vim-tiny vim-athena ssh openssh-server mysql-client default-libmysqlclient-dev openjdk-8-jdk build-essential
   
  RUN mkdir -p /home/cellprofiler/cellprofiler
  RUN mkdir -p /home/cellprofiler/.ssh
  WORKDIR /home/cellprofiler
   
  COPY environment.yml environment.yml
   
  RUN conda env create -f environment.yml && conda clean --all -y
  RUN echo "alias l='ls -lah'" >> ~/.bashrc
  RUN echo "source activate cellprofiler" >> ~/.bashrc
   
  ENV CONDA_EXE /opt/conda/bin/conda
  ENV CONDA_PREFIX /opt/conda/envs/cellprofiler
  ENV CONDA_PYTHON_EXE /opt/conda/bin/python
  ENV CONDA_PROMPT_MODIFIER (cellprofiler)
  ENV CONDA_DEFAULT_ENV cellprofiler
  ENV PATH /opt/conda/envs/cellprofiler/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  ENV JAVA_LD_LIBRARY_PATH /opt/conda/envs/cellprofiler/jre/lib/amd64/server
  ENV JAVA_HOME /opt/conda/envs/cellprofiler
   
  #RUN conda install -f -y javabridge numpy=1.11
  RUN pip install --force centrosome
view rawDockerfile hosted with  by GitHub

I was able to use this same DockerFile to create containers for several versions of CellProfiler for testing purposes.

(Yes, you MUST have MySQL installed as a system package. Even if you have no intentions of using it.)

The Conda Environment Definition

Conda environments are defined in yaml files. They are fairly straightforward. Just give your environment a name, add some channels (groups that distribute specialized conda packages) and list your packages. You can even include pip packages. The strategy we use here is to install all the base packages as conda packages, and then to install a specific version of CellProfiler with pip.

 

CellProfiler v3.1.8

This is my v3.1.8 version, but I was able to get 2.3.1 built simply by changing the version number in the cellprofiler pip installation.

Build Your CellProfiler Docker Container

mkdir cellprofiler-docker
cd cellprofiler-docker
wget https://gist.github.com/jerowe/64aa26ccb50ffd9dcad2bfc2477c7353/raw/3baf67a3138683766b18172efcad3701e14d9099/Dockerfile
wget https://gist.github.com/jerowe/0ec7628e962838e9b4e6b98328322de9/raw/45edb335cf6015aa6b5d522f82a9cf8a7e209ae4/environment.yml
docker build --rm . -t my_dockerhub_username/cellprofiler:3.1.8
Bash

Check that it works

Ha! Only the important part!  Some next level docker building would be to add a few specific images and a CellProfiler Pipeline to ensure it works.

> docker run -it my_dockerhub_username/chemgen-cellprofiler:3.8.1 bash
(cellprofiler) root@deab0a611a5e:/home/cellprofiler# ipython
Python 2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19)
Type "copyright", "credits" or "license" for more information.

IPython 5.8.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import cellprofiler
In [2]:
Bash

I just ran a super quick 'import cellprofiler' test in ipython, but I've also had an actual scientist verify that a pipeline works.

Push your Docker Container to DockerHub

If you're running this image locally you could stick with just this. If you need this available someplace else, I'd suggest tagging it and pushing it to either Quay.io or DockerHub. If you are feeling unsure of how to do this I found this tutorial to be short, sweet and to the point. All you need is a DockerHub account to follow the tutorial.

#Make sure you are logged in first!
docker push my_dockerhub_username/cellprofiler:3.1.8
Bash

Run CellProfiler

This image is a bit configured a bit differently than the official one. You will need to run :

cellprofiler --run --run-headless (rest of your command)
Bash

Wrap Up

That's it. I hope you see how you can take a set of installation instructions, and maybe even a DockerFile that is almost what you need, and modify it so that you have exactly what you need!

Once you've decided on your versions fetch the DockerFile and the environment.yml file and go ahead and built it.

Bioinformatics Solutions on AWS Newsletter 

Get the first 3 chapters of my book, Bioinformatics Solutions on AWS, as well as weekly updates on the world of Bioinformatics and Cloud Computing, completely free, by filling out the form next to this text.

Bioinformatics Solutions on AWS

If you'd like to learn more about AWS and how it relates to the future of Bioinformatics, sign up here.

We won't send spam. Unsubscribe at any time.