Deploy HPC Modules From Bioconda Packages
Oct 28, 2019The Struggle is Real
I have been working in Bioinformatics for nearly 10 years, mostly on the computational side of things. I have spent a lot of that time building and installing software. Some of those wounds will never heal! Luckily, along came Anaconda, the scientific distribution of Python, along with the awesome BioConda who took on the task of installing bioinformatics software with relative ease! I don't know if Anaconda necessarily wanted to make life easier for those installing software on HPC systems, but in any case they did.
(Disclaimer, I am technically a core team member of BioConda, but I'm really kind of a slacker core member and the real credit goes to the rest of the team!)
Deploy Modules with EasyBuild
One of my main goals in life is to deploy conda packages as HPC Modules. Deploying HPC Modules can be a bit of a pain. There are a lot of naming conventions, environmental variables, file permissions, recursive file permissions, and just generally tons of stuff I don't want to deal with.
In fact, I really shouldn't be dealing with it because any system that relies on me actually memorizing anything and having my act together is just doomed.
Anyways, I was introduced to EasyBuild a few years ago, and have since abused it mostly to install BioConda packages. It has so, so much more functionality than what I use it for, and I recommend you check it out!
Generating the Configs
Easybuild works by using templates, or EasyConfigs, which get parsed by some awesome Python code, and then spit out into ready to consume HPC Modules.
For YEARS I have been meaning to build a tool that would allow to easily spit out these configs and I finally have! Woooo.
Use the Script
First of all, this script comes along with the usual disclaimers. I wrote it, mostly for my use case, and I didn't consider other people's potential use cases. I tested this out and I've used it, but it isn't the most robust of tools! 😉 With all that out of the way there are 3 basic ways you might want to use this script.
Generate an EasyConfig from a Single Conda or BioConda Package
If, for instance you check the BioConda Recipe for Trimmomatic you will see it has all sorts of fun information, including a name, version, homepage and summary (most of the time anyways). This, as it turns out, is all the same information we need to build out our EasyConfig, and it's available through the Anaconda Client API!
Generate a Bundle EasyConfig for Modules
If you have existing modules, conda or not, and you would like to load them all with a single module load command, you can use the bundle subcommand. Note that this syntax is different from the package syntax we used above, as we are not querying the anaconda api.
When using this syntax you must specify both the name and the version of the module.
This would create a qc/1.0 module, which when loaded would also load the trimmomatic/0.39 and the fastqc/0.11.8 modules. Less typing wooooo!
Generate a Bundle EasyConfig from Conda Packages
You can also just jump straight into generating a Bundle from a list of conda packages. This command will create the EasyConfigs for each conda package and the EasyConfig for the bundle.
Building the Modules
Once you have your EasyConfigs you just point Easybuild at them and let them roll!