Configuring Apache Cassandra Cluster with Docker

This tutorial outlines steps to install and configure Apache Cassandra using Docker. Docker provides an easy way to create an Apache Cassandra cluster. Using Docker we will get an Apache Cassandra cluster up and running in minutes. The configuration provided is only meant for development and testing purposes. We will begin this tutorial by providing an overview of Docker and Docker Compose. We will then go on to provide configuration to setup a three-node Apache Cassandra cluster. The tutorial concludes by outlining different ways of interacting with the created cluster.

Docker Overview and Benefits

Docker is a container technology. It has become immensely popular with both developers and system administrators. Docker simplifies creation, deployment, shipping and running of applications. It enables you to configure your application once and run it anywhere.  Most of Dockers benefits are a result of Dockers ability to isolate applications and their dependencies. Think of Docker as a lightweight Virtual Machine (VM).

High-level difference between virtual machines and containers

Docker is often compared/confused with a VM. A VMs primary benefit is the ability to share hardware resources. VMs also had many side benefits i.e. the ability to create isolated environments. As VMs grew in popularity they were often used to ship and deploy preconfigured applications. In fact, every cloud provider made available VMs with preconfigured proprietary and open source software (OSS). Pre-configured software on VMs is very popular. Although popular, VMs are a heavyweight approach to building and shipping pre-configured software.

Containers provide a lightweight approach to virtualisation. To understand the surging popularity of containers we must understand the difference between containers and VMs. Both containers and VMs are virtualisation technologies. While VMs virtualize hardware, containers virtualize the operating system.VMs run on top of a hypervisor i.e. a piece of software, firmware, or hardware that allows multiple operating systems (OS) to share the same hardware. A hypervisors main goal is to abstract away the OS from hardware. Thus VM's emulates the entire operating system.

The main goal of a container is to abstract away the application from the operating system. Containers abstract away the “user space” i.e. the portion of memory where user processes run. Containers aka operating-system-level virtualization is a method of virtualization where the kernel of the operating system allows the existence of multiple user spaces. As a result, multiple user spaces share the same kernel. Virtualization at the operating system level provides a lightweight approach to application isolation.  Containers can startup in approximately 500ms as opposed to VM which typically takes 20 seconds.

Docker vs Virtual Machines

Docker vs Virtual Machines

The image above illustrates the high-level difference between VMs and containers. Note a type 2 hypervisors ( one that runs on top of an OS ) is depicted above.

Containers are not a new concept. Although they have been around for a while they have remained unpopular. This was mainly because containers were hard to configure and use. Docker changed all that. Docker provided an API wrapper and tooling around containers. This made containers way easier to use. Docker has grown into a full-blown ecosystem. It has a growing number of tools to help build, configure, share and ship containers.

Below is a list of key concepts/tools to get started with Docker

  • Docker Images - An immutable file that is a snapshot of the container. An instance of an image is a container. Docker images are composed of layers of other images. This enables efficient network transfer when exchanging image data over a network.
  • Docker Hub - A public registry that enables users to search and share Docker images. Docker Hub is a great resource for getting hold of popular open source Docker images.
  • Docker ComposeAn important tool that enables you to work with multi-container applications. It provides an efficient way of configuring, starting and stopping multi-container Docker applications.

Docker Apache Cassandra Cluster

Let's create a three node Apache Cassandra cluster. In order to create this cluster, you will need to have Docker and Docker Compose installed. Use Docker and Docker Compose installation documentation to get them both up and running on your machine.

Resource Configuration in Docker for Mac

In case you are on a Mac or Windows machine, you will need to allocate enough memory for the cluster to run. Each node needs at least 2 GB of memory and thus I would suggest an 8GB allocation. On Mac and Windows, Docker uses virtualisation technology and thus the need to allocate dedicated resources. On Linux, the Docker engine runs natively and will be able to reserve the required resources provided it is supported by the underlying hardware.

Once you have installed Docker and Docker Compose create a Docker Compose file. Call the file docker-compose.yml and place it in an empty directory of your choice. For the purpose of this tutorial, it is important to call the file docker-compose.yml.

Please copy the contents of the Docker Compose file below into your docker-compose.yml. In order to create an Apache Cassandra container, we need an appropriate image. We will use the official Apache Cassandra image. The compose file is well commented and provides details on every choice made.

To boot the cluster navigate to the directory where you have created the Docker Compose file and run the following command:

By default, Compose looks for docker-compose.yml. If you have named you file differently you must use the -f flag. Example command follows:

On starting up the containers you should see the similar output.

The above compose file will start up four containers. When you do this for the first time it will take a few minutes as the Apache Cassandra and Portainer images are downloaded from Docker Hub. The image used is configured in the command option in the Docker Compose file. Starting up Apache Cassandra for the first time will be slow. This is because we need to provide a lag between starting up each node.  Apache Cassandra recommends the ‘2 minute rule’. When booting up you must give 2 minutes between booting up each new node. It is a mistake to start up all nodes at once. Please note I have given 60 seconds which is suffice for the current configuration.

Once the containers are up and running please navigate to the Portainer UI at http://localhost:10001. Portainer provides a web UI over Docker. I find it an easy way of managing/interacting with Docker containers.

When you log in for the first time you will see the following screen.

Portainer specify admin password screen

Portainer specify admin password screen

Please choose an appropriate password. As you might have already guessed this will only happen the first time you start the containers.

Next Portainer will ask you about the Docker engine instance you want to connect to. Currently, we just want to connect to the local instance. Please choose the “Manage the Docker instance where Portainer is running” option.

Docker engine configuration screen

Once you connect to you local docker engine you will be redirected to the Portainer home screen.

Portainer Home Screen

Portainer Home Screen

You will see the four containers that have been created. Click on the "Containers" menu item to see a list of your containers.

Cassandra Cluster Containers

Cassandra Cluster Containers

You can get container details by clicking on any of the containers. Click on the cassandradockercompose_DC1N1_1 link. This will take you to the container details screen.

Portainer Container Details Screen

The container details screen enables you to access basic container stats and logs. You can also SSH into the console using the "Console" link. Please click on the console link and connect to a bash console. You should see a bash console as shown in the screenshot below.

Portainer Bash console

Portainer Bash console

Let's quickly check if all three nodes in our cluster are up. We will do this by running the nodetool status command.  

As you can see all three nodes are up. You can also connect to Apache Cassandra using the cqlsh command. Simply type cqlsh in the command prompt.

I hope, this has given you a good overview of how to create an Apache Cassandra cluster using docker.  A good way to explore your cluster would be via a CQL tutorial.

Love to hear your thoughts?

References:

 

13 Responses to Configuring Apache Cassandra Cluster with Docker

  1. Pedro September 27, 2017 at 10:25 pm #

    Thanks, excellent tutorial.

    • Akhil October 22, 2017 at 7:49 pm #

      Thanks for the feedback appreciated.

  2. Steve August 2, 2018 at 7:39 pm #

    Good tutorial, but what is http://templates/templates.json all about? That doesn’t resolve to anything useful on my system… so my portainer image doesn’t run.

    • Akhil August 4, 2018 at 9:10 pm #

      Thanks. For the purpose of this tutorial that line can be removed. I have updated the compose file. Templates are a neat feature that enable you to configure what shows up under the “App Templates” menu item. App templates enable you to launch docker containers with a single click.

  3. srini October 20, 2018 at 5:38 am #

    All the cassandra dockers are going down after 60 seconds

  4. srinivas October 20, 2018 at 5:39 am #

    All the cassandra dockers are going down after 60 seconds

  5. Akhil November 30, 2018 at 11:43 am #

    I am guessing you are running out of memory on each node.

  6. Yong January 23, 2019 at 5:04 am #

    Excellent guide.

    Is it possible to use sstable tools like sstableloader or sstabledump in a cassandra container?

  7. Akhil January 25, 2019 at 11:40 pm #

    Thanks Yes, it is. Just like running nodetool status you can also run sstableloader and sstabledump.:Logged into the container via Portainer and run these commands.

  8. Yang Ninn March 2, 2019 at 3:39 pm #

    Holy, this is awesome, will you have any upcoming tutorial for Cassandra cluster with Elasticsearch ? That could get lots of attention if you make one since everyone always talk about scalability

  9. GANESH SREEKUMAR September 27, 2019 at 10:58 am #

    could you please share the docker compose configuration for running nodes in different hosts(virtual machines)

  10. GANESH SREEKUMAR September 27, 2019 at 11:00 am #

    could you please share the config for running the seed node in one vm and the other nodes in another vm ie different data center same cluster

Trackbacks/Pingbacks

  1. Cassandra cluster with docker – Blog of daveyx - December 6, 2017

    […] https://abiasforaction.net/apache-cassandra-cluster-docker/ […]

Leave a Reply

sixteen − 6 =