Deploy a Celery Job Queue With Docker – Part 2 Deploy with Docker Swarm on AWS

celery distributed computing docker job queue python Feb 13, 2019

Overview

In Part 1 of this series we went over the Celery Architecture, how to separate out the components in a docker-compose file, and laid the ground for deployment.

Deploy With AWS CloudFormation

This portion of the blog post assumes you have a ssh key setup. If you don't go to the AWS docs here.

What is CloudFormation?

AWS CloudFormation is an infrastructure design tool that allows users to design their infrastructure by defining file systems, compute requirements, networking, etc. If you have no interest in designing infrastructure, y0u probably don't need to worry. Cloudformation configurations are shareable through templates. 

Docker AWS CloudFormation

Getting Started

Docker has come to our rescue here, with a Docker for AWS CloudFormation template. This will, with the click of a few buttons, deploy a docker swarm on AWS for us!! 

Click on the page, and scroll down to quick start. Under 'Stable Channel' select 'Deploy Docker Community Edition (CE) for AWS'.

Configuring and Launching Your Swarm Cluster

CloudFormation is some cool stuff. A web UI is generated for you based on the template. In this case the important things to notice are the instance name, number of Manager Nodes, the number of Worker Nodes, and the ssh key you want to use. There are a few other properties, such as daily resource cleanup. Be sure to take note of the SSH Key, as you will need it later.

This is a really small cluster, and only for demo purposes, so I want 1 manager node and 1 worker node.

It's not shown here, but make sure you take note of the name you assign. You will need it later to check out your logs in CloudWatch.

Finish going through the wizard. I basically just press next all the time. Once you've done that you will get to a complete screen, and you want to ensure you submit it. If you don't it will be just like when you write an email and forget to press send.

From this point on we will follow the deployment instructions.

Wait (and wait) for your swarm to start

Once you complete the CloudFormation wizard you will get to the CloudFormation Management Console. If you don't see anything don't panic! Just refresh the page, and you should see a CloudFormation instance come up with the same name you assigned it earlier in the wizard. It will go through several status updates, but after a few minutes you should be able to refresh the page and see a CREATE_COMPLETE Status.

Select your instance, and click on 'Outputs'. You may have to scroll down a bit, but you are looking for that blue url next to the 'Managers' key. Click on it to get to the EC2 Management Console.

Later, you will want to use the DefaultDNSTarget to access the web portion of your application.

The EC2 details for your Swarm manager node should come right up. You are looking for the Public DNS address.

Now, we are going to ssh to our manager instance, and bring up our swarm! My ssh key is named jillianerowe-aws-keypai.pem, and I have it in my ~/.ssh directory (with permissions 400). SSH as user docker at YOUR public IP address. 

If you see 'Welcome to Docker' Congratulations! You have your very own compute cluster!

Deploy Your Stack With Docker Swarm

The configuration from docker-compose to docker-swarm is mostly the same, but there are a few key changes you need to note.

These are the same changes that you need to make whether you are deploying to AWS or elsewhere. With the exception of taking a look at the CloudWatch interface in AWS, this process is exactly the same no matter where you are.

Docker Compose to Swarm Mapping

Replace build context with an image

During development we wanted to have an image that updated constantly, but this is not the case for production deployment. That freewheeling madness simply cannot stand. This is production, remember?  In fact, it's not even possible to use a build context with docker swarm. You must have an image available from a repository. I replaced the build context with an image I uploaded to Quay.io. If you aren't clear how to go from a development docker image on your machine to an image on quay or dockerhub, check out this post here.

Move your labels from directly under the service to the deploy key

You will notice in the docker-compose.yml file we had all of our labels directly under the service name. Now we want to deploy to docker swarm, and they need to be moved. ( I find this annoying too. )

Change around the networks from bridge to overlay

In the docker-compose configuration I used a bridge network, but for swarm I need an overlay network. I kind of think I could have used an overlay network in the compose configuration as well, but I really hate networking (this seems to be a theme with me), and I need to test it out some more.

Changing localhost to your actual host name

This is kind of aggravating, and I'm really hoping some super smart person has a way around this. Consider this my cry for help! But in the traefik labels you need to ensure that the Host:localhost are changed to Host:My-Actual-Host-Name. This corresponds to the public DNS we saw earlier.

Deployment Configurations

Looking at the docker-compose file, you will see that each service has a deploy configuration. The deploy key in the file is ignored by docker-compose, but if you use docker-swarm you can use this configuration to deploy multiple instances of a single service.

    deploy:
      replicas: 1
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
        max_attempts: 3
        window: 120s
      placement:
        constraints: [node.role == manager]


DEPLOY KEY

DESCRIPTION

replicas

How many instances of this service to deploy. The celery worker is the most interesting example here. Say we tell the celery worker to have 12 concurrent tasks.  Then, we deploy 10 instances of the services. This would mean at any given time we could run 120 (12 * 10) tasks concurrently. Cool!

update_config.parallelism

When updating the swarm, how many to update at a time.

update_config.delay

How much of a delay to have between service updates.

restart_policy.condition

This is an enumeration value. I always tell it to restart when it fails.

restart_policy.max_attempts

How many times to attempt to restart before blowing up and saying NO.

placement

Place on either a worker or a manager node. Generally, the load balancer (traefik) goes on the manager node. I also put the broker on the manager node, and everything else goes on the worker nodes.


Get the stack configuration over to your cluster

Now, we are going to check out that docker-compose-swarm file! Make sure to change the hostname to your actual host! If you don't have the source code, you can use any Celery Job Queue with docker-swarm, or scroll to the bottom to get the source code.

You will have to scp it over to the swarm cluster because that thing is locked down, and doesn't have rsync. I'm sure there are ways to open up internet traffic, but that would require that I learn networking. I really hate networking. Run this scp command from your localhost, or wherever you ran the previous wget command from.

# From your localhost
#You must change this to point to YOUR ssh key and YOUR docker manager IP address!
scp -i ~/.ssh/my-ec2-ssh-key.epm docker-compose-swarm.yml "docker@SOME_IP:/home/docker/"

Bring up your services

Great! Now, you should be in a shell in your docker cluster, using the public IP address along with the ssh key from earlier.

# From the cluster
docker stack deploy -c docker-compose-swarm.yml celery

And that's it! Your stack will take a little to spin up. Check it out with the command 'docker service ls' .

 

It takes a little while to download the images, so if you see your REPLICAS as 0/1 or 0/2 don't worry. They will be up soon!

If they aren't, you will need to check out the logs in CloudWatch.

Check out your logs in Cloud Watch

Go to the CloudWatch console, on the lefthand side click on logs, and select the log group that corresponds to the name you gave your swarm cluster earlier.

My services are stuck. How do I restart them?

Stuff happens, particularly when you are scoping out requirements for instance capacity (memory, cpus, t1.micro vs something actually functional, etc).

If this happens, what you want to do is to restart all the services. The services correspond to the name you gave the stack, underscore, the service name defined in the swarm/compose configuration. To restart the services first you need to remove them, and then redeploy the stack.

docker service rm celery_traefik-manager celery_rabbit celery_job_queue_flask_app celery_job_queue_celery_worker celery_job_queue_celery_flower
docker stack deploy -c docker-compose-swarm.yml celery
Bash

Check it out!

Now that all of our services are up and running, let's go and see how they are doing!

Celery Flower

Let's go and check out our flower service! My url is http://celery-jo-external-1tux4u1ptkthe-786187927.us-east-1.elb.amazonaws.com/flower, but if you have been following along you will have your own. Once you select it you will see there are no jobs, as we would expect since it just launched.

Add an AddTogether task through the Flask API at /api

If you use Postman open that up and add a task. Just change 'localhost' to your public dns address. You should see a response with the context.

Now, if you go and check out the flower interface, you should see that a job has been queued and eventually processed.

Head on over to /traefik to get an overview of your services

Traefik has a pretty nifty dashboard interface that let's you know what's happening in your services.

A word to the wise. I did find that my traefik dashboard  interface was kind of glitchy on an AWS micro instance. Oddly enough everything else was fine. It was just viewing the web UI that was troublesome. If I was deploying this for an actual production use case, I would scale up my manager node. Since I'm paying for this I'm not going to do that, but here's a picture of what it should look like.

Wrapping Up

There was a lot going on in the last two posts, but I hope now you have a general idea of how to organize and deploy a job queue using docker and docker swarm!

 

Bioinformatics Solutions on AWS Newsletter 

Get the first 3 chapters of my book, Bioinformatics Solutions on AWS, as well as weekly updates on the world of Bioinformatics and Cloud Computing, completely free, by filling out the form next to this text.

Bioinformatics Solutions on AWS

If you'd like to learn more about AWS and how it relates to the future of Bioinformatics, sign up here.

We won't send spam. Unsubscribe at any time.