Deploy Bioinformatics Modules on HPC

Sep 23, 2019

Deploying scientific software in an HPC environment can be challenging. Deploying bioinformatics software anyhow anywhere, or say a High Performance Computing Cluster (HPC), can be especially challenging!

Life, at least in this regard, has become so much better in the last few years. Anaconda, the scientific python distro along with Conda, package manager and builder awesomeness made deploying software so much more streamlined. There are amazing groups contributing packages to conda. It's become a whole ecosystem of people working on infrastructure, software and packaging. Bioconda and Conda-Forge are two great groups that have added a ton of value to communities that use scientific software. EasyBuild gives you some pretty great capabilities for deploying your software as modules, which makes them available to everybody!

Disclaimer - I am a core team member of Bioconda, but I'm kind of a slacker member and they are awesome all on their own!

HPC Challenges

Deploying software on HPC comes with its own set of difficulties. You can't install anything as a root package on the system itself. Everything is loaded through modules and lots of software expects certain libraries to live as system dependencies. All software that depends on boost!

HPC Modules

Deploying modules is a whole skillset on its own. You need to set environmental variables, and you need to know how to patch software to not want to look at system libraries. 

If you're really lucky your software is already available as a conda package. Then you don't need to worry about patching, or libraries, or anything. You can install it as a conda package and deploy it as a HPC (Lmod, Environment Modules) module using ​EasyBuild.

I'm going to assume that you already have Lmod or Environment Modules installed, but both are available as system packages.

Get Started with EasyBuild

Install Miniconda

My favorite way to install EasyBuild is through the conda package. I always install a small miniconda for myself, not as a module. You'll see later down that I also install Anaconda3 as a HPC module. The justification for having two is that one is for me, ​the admin, the other is for users, and as a part of the modules infrastructure, and is for users.​​

#!/usr/bin/env bash

##################################################################
## Make lmod available
##################################################################
source /usr/local/lmod/lmod/init/bash

##################################################################
## Anaconda Setup
## We are using miniconda3/python3 from anaconda
## Be sure to change this if you are running as a your own user
##################################################################
export ANACONDA3_BASE="/apps/anaconda3"
curl -s -L https://repo.continuum.io/miniconda/Miniconda3-4.5.12-Linux-x86_64.sh > miniconda.sh

chmod 777 miniconda.sh ; sudo bash miniconda.sh -f -b -p $ANACONDA3_BASE
rm miniconda.sh && \
    export PATH=${ANACONDA3_BASE}/bin:$PATH && \
    conda config --set show_channel_urls True && \
    conda config --add channels conda-forge && \
    conda config --add channels defaults && \
    conda update --all --yes && \
    conda clean -tipy
Bash

Now that we're ready with our miniconda distro we want to create an environment for easybuild to live in. If you're like me you want to just install ipython into the root. I love having ipython just hanging around as a shell.

##################################################################
## EasyBuild User Setup
## This is only necessary if you are creating a user for EB
## If running as your own user skip this section
## But you cannot run easybuild as root
##################################################################
#useradd -ms /bin/bash ebuser
#cd /home/ebuser
#export HOME=/home/ebuser

##################################################################
## EasyBuild Housekeeping
## Before we install easybuild let's source some things and set some variables
##################################################################

source /usr/local/lmod/lmod/init/bash
export ANACONDA3_BASE="/apps/anaconda3"
export PATH=${ANACONDA3_BASE}/bin:$PATH
export EB_ENV=eb--3.7.0
export EASYBUILD_PREFIX=/apps/easybuild/2.0
export MODULEPATH=/apps/easybuild/2.0/modules/all:$MODULEPATH

##################################################################
## Add Easybuild Config
## For now the config doesn't have anything except some commented
## out sections for the modules-tool and modules-syntax
##################################################################
mkdir -p $HOME/.config/easybuild

##################################################################
## Conda config for ebuser
## Setup the initial conda configuration for the ebuser
##################################################################
conda config --set show_channel_urls True && \
    conda config --add channels defaults && \
    conda config --add channels conda-forge && \
    conda config --add channels bioconda && \
    conda config --set always_yes True && \
    conda config --set allow_softlinks False


##################################################################
## Finally! Install Easybuild
## Create a conda env with easybuild
##################################################################
conda create -n $EB_ENV easybuild=3.7.0
Bash

Woooo! Now Easybuild is installed, and we are ready to deploy some modules!

##################################################################
## Install an Easybuild Config
## Anaconda3 is a Core EB config
## --software-name is a way of telling EB to look for software
## from the repos instead of supplying a .eb file
##################################################################
source activate $EB_ENV && \
	eb --software-name Anaconda3
Bash

Now, EasyBuild has quite a few modules included in it's core distro. If you want to go and check out all the awesome offerings just head on over to EasyBuilders EasyConfigs.

As a quick aside, I am only focusing on deploying Conda based EasyBuild modules. Installing other module types can be more involved, because you need to start thinking about compiler toolchains. For most bioinformatics software you will be just fine sticking with the conda modules, because you're not going to get much gain (and you'll probably break stuff) by using a different toolchain or compilation option.

What about my modules?

Don't worry, you're covered there too! Put your EasyBuild config files somewhere and point EasyBuild at them. If you need an example of a Conda package there are plenty available as core software. 

Here are some examples:

For the sake of argument we'll assume that you have some modules available in a github repo. This is how you go about deploying them.

##################################################################
## Add Custom Easybuild Configs
## These are custom configs outside of the Easybuild Main
##################################################################
mkdir -p $HOME/.eb/custom_repos
cd $HOME/.eb/custom_repos
git clone my-custom-repo
cd $HOME

###################################################################
### Add Easybuild Configs to the Robot Path
### Robot will tell EB to automatically pull in deps
### Robot-path will tell EB where to look for configs
###################################################################
export ROBOT=$HOME/.eb/custom_repos/my-custom-repo

source activate $EB_ENV
# Check out what will happen with --dry-run
#eb --dry-run  --robot --robot-paths=$ROBOT   software-version.eb
### Using --extended-dry-run will give you more information
#eb --extended-dry-run  --robot --robot-paths=$ROBOT  software-version.eb
### Remove the --dry-run in order to actually install the module
eb --robot --robot-paths=$ROBOT   software-version.eb
module avail
Bash

Make your modules available to the WORLD!

Or other users. That's fine too.  Just make sure they have the Lmod or Environment Modules package sourced, and that their shell knows where to look for modules. Here's an example setup, but your setup may be slightly different.

source /usr/local/lmod/lmod/init/bash
export MODULEPATH=/apps/easybuild/1.0/modules/all

Bioinformatics Solutions on AWS Newsletter 

Get the first 3 chapters of my book, Bioinformatics Solutions on AWS, as well as weekly updates on the world of Bioinformatics and Cloud Computing, completely free, by filling out the form next to this text.

Bioinformatics Solutions on AWS

If you'd like to learn more about AWS and how it relates to the future of Bioinformatics, sign up here.

We won't send spam. Unsubscribe at any time.