Deploy Bioinformatics Modules on HPC
Sep 23, 2019Deploying scientific software in an HPC environment can be challenging. Deploying bioinformatics software anyhow anywhere, or say a High Performance Computing Cluster (HPC), can be especially challenging!
Life, at least in this regard, has become so much better in the last few years. Anaconda, the scientific python distro along with Conda, package manager and builder awesomeness made deploying software so much more streamlined. There are amazing groups contributing packages to conda. It's become a whole ecosystem of people working on infrastructure, software and packaging. Bioconda and Conda-Forge are two great groups that have added a ton of value to communities that use scientific software. EasyBuild gives you some pretty great capabilities for deploying your software as modules, which makes them available to everybody!
Disclaimer - I am a core team member of Bioconda, but I'm kind of a slacker member and they are awesome all on their own!
HPC Challenges
Deploying software on HPC comes with its own set of difficulties. You can't install anything as a root package on the system itself. Everything is loaded through modules and lots of software expects certain libraries to live as system dependencies. All software that depends on boost!
HPC Modules
Deploying modules is a whole skillset on its own. You need to set environmental variables, and you need to know how to patch software to not want to look at system libraries.
If you're really lucky your software is already available as a conda package. Then you don't need to worry about patching, or libraries, or anything. You can install it as a conda package and deploy it as a HPC (Lmod, Environment Modules) module using EasyBuild.
I'm going to assume that you already have Lmod or Environment Modules installed, but both are available as system packages.
Get Started with EasyBuild
Install Miniconda
My favorite way to install EasyBuild is through the conda package. I always install a small miniconda for myself, not as a module. You'll see later down that I also install Anaconda3 as a HPC module. The justification for having two is that one is for me, the admin, the other is for users, and as a part of the modules infrastructure, and is for users.
Now that we're ready with our miniconda distro we want to create an environment for easybuild to live in. If you're like me you want to just install ipython into the root. I love having ipython just hanging around as a shell.
Woooo! Now Easybuild is installed, and we are ready to deploy some modules!
Now, EasyBuild has quite a few modules included in it's core distro. If you want to go and check out all the awesome offerings just head on over to EasyBuilders EasyConfigs.
As a quick aside, I am only focusing on deploying Conda based EasyBuild modules. Installing other module types can be more involved, because you need to start thinking about compiler toolchains. For most bioinformatics software you will be just fine sticking with the conda modules, because you're not going to get much gain (and you'll probably break stuff) by using a different toolchain or compilation option.
What about my modules?
Don't worry, you're covered there too! Put your EasyBuild config files somewhere and point EasyBuild at them. If you need an example of a Conda package there are plenty available as core software.
Here are some examples:
For the sake of argument we'll assume that you have some modules available in a github repo. This is how you go about deploying them.
Make your modules available to the WORLD!
Or other users. That's fine too. Just make sure they have the Lmod or Environment Modules package sourced, and that their shell knows where to look for modules. Here's an example setup, but your setup may be slightly different.