Setup a Bioinformatics Demultiplex Server from Scratch
Nov 07, 2019Install Demultiplex Software
Installing demultiplexing such as bcl2fastq, CellRanger, LongRanger, demuxlet, and whatever else pops up, holds a special place in those that do Bioinformatics and Genomics hearts and potential support groups. It has been enough of an issue in my professional life that I thought I would dedicate a series to setting up servers for different analysis types.
Don't install system packages
This is my big chance to go on a total rant about bioinformatics servers!
Don't install all kinds of software as system packages. Ok? Just don't do it. It may not backfire on you today, or tomorrow, but someday it will!
I'm going to make a few caveats to that. Things like zlib, openssl, and ssh are fine. I'll even cheat sometimes and yum install some development tools. Mostly, what I am talking about here is bioinformatics software. Don't bother installing bcl2fastq, blast, augustus, R, python, dask, or pretty much anything else as system dependencies.
There are better solutions, that I promise aren't that bad to get started with, and they will mostly prevent you from killing your severs!
Prepare Your Server
The very first thing I do when I get a shiny new server is to install Lmod, Miniconda and EasyBuild.
Install Lmod
If you need to install Lmod without root permissions I recommend checking out the Easybuild Guide on installing Lmod with root permissions.
This is going to be different depending on your system. If you're using yum you'll need to enable the epel repos.
#!/usr/bin/env bash
yum install -y epel-release
# If you're on AWS and using one of their newer AMIs
# you'll install it from the
# amazon-linux-extras package manager.
# amazon-linux-extras install epel
yum-config-manager --enable epel
# Optional, but nice for servers that have been sitting around forever
# yum update -y; yum upgrade -y
yum install -y Lmod
From there, to make your modules available you'll need to source the shell awesomeness that is Lmod. If you've installed from source, yum, apt-get the location may be different.
Try those commands. If the file doesn't exist it will throw an error. If neither of these works the lmod script is someplace weird and you will have to do a bit of hunting. The easiest way to do that is to use the linux command find, which is a totally awesome command and along with grep runs most of my life.
Install Miniconda and EasyBuild
We're going to do this with one fell swoop here by using a bash script. I'm not going to go through this whole thing, just know that you'll need to switch around the environmental variables at the top to suit your setup. Make sure you have a non root user to do all this with, because EasyBuild will complain loudly and then die if you try to run it as root.
Once you've changed the environmental variables to suit your setup just chmod 777 the script, run, and off you go!
Install your Software
Right now we have our basic setup. We have a base miniconda for our own purposes, Lmod, Easybuild, and a Miniconda3 base module that we will use as our base module for our bioinformatics software. Let's go through installing a few common demultiplex softwares here.
A Quick Note - Installing Conda Software with EasyBuild
EasyBuild does some truly awesome things, and admittedly I don't use even a fraction of its awesomeness. EasyBuild has a ton of toolchains available, which are perfect for those who truly care about the exacting details of how their software is compiled. I've found that for bioinformatics software this just doesn't matter as much. We rarely have vectorized code, or code that could benefit from the intel compiler. If you want to know more about this just check out toolchains with EasyBuild.
I tend to install everything with conda and bioconda, because they have just done such a great job of making my life easier. Then, sometimes I create additional recipes with EasyBuild for software that has a do not distribute clause. GATK, bcl2fastq, and CellRanger I'm looking at you guys here!
Install a base devtools package
A devtools package is going to give us all of our low level packages that we need. Again, if you're interested in doing this from scratch check out toolchains in EasyBuild. Otherwise, just install them from conda.
You could also create eb configs for each of these packages separately.
And boom! You will see your software get installed. I'm planning on putting together a github repo with some of my most commonly used easybuild configs, so stay tuned!