Intro

Software drives innovation in every aspect of our lives; as System Engineers we must not only support that software, but also scale that support by writing our own software. Oh sure, it’s got fancy names like “configuration management” or “desired-state convergence-based system administration”, but it’s code, pure and simple. In order to achieve this scale and move faster, I’ve had to acquire knowledge from the programmers around me: things like revision control, modularized programming, and automated testing.

Now I document my code. Puppet-lint keeps my commits from making the parser cry. I create feature branches and ask for code reviews. But how do you test something that’s meant to change machines irrevocably, without either taking forever to stand up new machines (whether bare-metal or virtual), or skipping the whole thing and putting your infrastructure at risk?

For the longest time, “testing” meant “roll out changes, but keep one finger over the big red button.” Now, there are two tools that improve the situation:

  • Vagrant, for frictionless creation, provision and destruction of virtual machines
  • ServerSpec, for testing services and functionality

Assumptions and code

I’m going to assume:

  • You have Vagrant installed already, and have worked your way through the awesome docs; you know “vagrant up”, “vagrant ssh” and “vagrant destroy”, and have at least looked at a Vagrantfile before.
  • You have Ruby set up already.
  • You have a passing familiarity with Puppet.

The examples in this post, along with some setup scripts and background information, have been shared on GitHub; you can find them here. The repo is organized into three examples, and as we go along I’ll mention which one we’re at.

One important note: I’m testing Puppet — but there is nothing in this approach that requires Puppet.  You could swap it out with CFEngine, Ansible or just about any other configuration management tool.

Problem statement

As we continue to deploy systems and the complexity of the infrastructure grow, we have an increased need to test the software and solutions being introduced. In this post, let’s use the deployment of  RabbitMQ as an example.

Let’s get this party started

Where do we start?  Let’s initialize Vagrant with the usual three-step:

 mkdir testing
 cd testing
 testing vagrant init hashicorp/precise64

This gave me a 64-bit version of Ubuntu 12.04, with Puppet 2.7 pre-installed, that is ready to boot.  I wrote a simple Puppet manifest to install RabbitMQ and saved it as “manifests/init.pp”:

package { 'rabbitmq-server':
  ensure => installed,
}

(In the code repo for this post, you’ll find the Vagrantile and the Puppet manifest in the “example_1” directory.) Next, I set up a directory for Puppet modules we’ll use later on in this tutorial:

testing $ mkdir -p manifests ~/puppet/modules

Finally, I edited that Vagrantfile and added the bits I want. When I was done, it looked like this:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # FORESHADOWING: This will let me add another VM later.
  config.vm.define "rabbit" do |rabbit|
    rabbit.vm.box = "hashicorp/precise64"
    rabbit.vm.hostname = "rabbit.example.com"
    rabbit.vm.network "private_network", ip: "192.168.50.100"
    rabbit.vm.provision :puppet do |puppet|
      puppet.manifest_file  = "init.pp"
      puppet.module_path = "~/puppet/modules"
      # Need to set the fqdn here as well; see
      # http://www.benjaminoakes.com/2013/04/25/making-puppets-fqdn_rand-play-nice-with-vagrant/
      puppet.facter = { 'fqdn'  => rabbit.vm.hostname }
    end
  end
end

I booted the VM by running “vagrant up”.  This:

  • booted the VM
  • set the hostname to rabbit.example.com
  • gave the VM a second network interface with the IP address 192.168.50.100
  • recursively copied the manifest_file, and everything under module_path, into the VM, and ran puppet apply with those copied files in its path

Let me emphasize that last part: I gave it the path to my laptop’s local copy of our Puppet codebase (~/puppet/modules), and Vagrant was smart enough to copy it to the VM, point Puppet at those files, and run the code in init.pp.  I didn’t have to set paths, or run rsync myself, or anything.

This is good, but the fine folks at PuppetLabs have a module for managing RabbitMQ on PuppetForge.  (We need to use version 4.0.0 due to an incompatibility with Puppet 2.7; see the README in the code repo for details.)  I added it to the modules directory, and changed manifests/init.pp to make use of it:

class { 'rabbitmq':
  environment_variables    => {
    'RABBITMQ_NODENAME'    =>'${hostname}',
    'RABBITMQ_SERVICENAME' => 'RabbitMQ',
  },
  port                     => '5672',
  config_cluster           => true,
  # We'll need a second machine later on...
  cluster_nodes            => ['rabbit.example.com', 'coney.example.com'],
  cluster_node_type        => 'ram',
  erlang_cookie            => 'SECRETCOOKIE',
  wipe_db_on_cookie_change => true,
  require                  => File["/usr/local/bin/puppet"],
  config_variables         => { 'loopback_users' = '[]', },
}

# The PuppetLabs module invokes Puppet itself to restart the
# RabbitMQ service when setting the erlang_cookie -- but it sets the
# PATH environment variable manually, and it doesn't include the *real*
# location of puppet in this machine.  The simplest is to symlink it
# to /usr/local/bin, which *is* in PATH.

file { '/usr/local/bin/puppet':
  ensure => "link",
  target => "/opt/vagrant_ruby/bin/puppet",
}

host { 'rabbit.example.com':
  ip           => "192.168.50.100",
  host_aliases => "rabbit",
}

package { "python-pika":
  ensure => installed,
}

(This manifest is in the “example_2” directory in the code repo.)  This ensures a few things:

1. The “class rabbitmq” stanza sets parameters for the RabbitMQ module.  In particular, it starts up a RAM-based RabbitMQ server — one that keeps queued messages in memory, rather than on disk.  (This is not what we’d want in production, but it simplifies this example.)

2. It links the Vagrant box’s copy of puppet to /usr/local/bin so that the RabbitMQ module can find it.

3. It adds an entry to /etc/hosts for this machine, and it specifies erlang_cookie.  (Once we get to two nodes, this is what will allow them to find and trust each other.)

4. It installs Pika, a Python module that can talk to RabbitMQ. (We’ll use this for testing up ahead.)

And sure enough, look at the output when I run vagrant up:

    ==> rabbit: notice: /Stage[main]//Host[rabbit.example.com]/ip: ip changed '127.0.1.1' to '192.168.50.100'
    ==> rabbit: notice: /Stage[main]//Package[python-pika]/ensure: ensure changed 'purged' to 'present'
    ==> rabbit: notice: /Stage[main]//File[/usr/local/bin/puppet]/ensure: created
    ==> rabbit: notice: /Stage[main]/Staging/File[/opt/staging]/ensure: created
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Repo::Apt/Apt::Source[rabbitmq]/Apt::Key[Add key: 056E8E56 from Apt::Source rabbitmq]/Apt_key[Add key: 056E8E56 from Apt::Source rabbitmq]/ensure: created
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Repo::Apt/Apt::Source[rabbitmq]/File[rabbitmq.list]/ensure: created
    ==> rabbit: notice: /Stage[main]/Apt::Update/Exec[apt_update]: Triggered 'refresh' from 1 events
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Install/Package[rabbitmq-server]/ensure: ensure changed 'purged' to 'present'
    ==> rabbit: notice: /Stage[main]/Rabbitmq/Rabbitmq_plugin[rabbitmq_management]/ensure: created
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Config/Exec[wipe_db]/returns: executed successfully
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Config/File[erlang_cookie]/content: content changed '{md5}e61788fe3ab925930d62d117867cac63' to '{md5}0ca06bb8047f9c6114f69740cfa30798'
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Config/File[rabbitmq.config]/ensure: defined content as '{md5}55484d623e0779fcc83a04688134363f'
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Config/File[/etc/rabbitmq/ssl]/ensure: created
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Config/File[rabbitmq-env.config]/ensure: defined content as '{md5}892faaf5991c6693f12590688fbf12b9'
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Service/Service[rabbitmq-server]/ensure: ensure changed 'stopped' to 'running'
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Service/Service[rabbitmq-server]: Triggered 'refresh' from 1 events
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/Staging::File[rabbitmqadmin]/Exec[/var/lib/rabbitmq/rabbitmqadmin]/returns: executed successfully
    ==> rabbit: notice: /Stage[main]/Rabbitmq::Install::Rabbitmqadmin/File[/usr/local/bin/rabbitmqadmin]/ensure: defined content as '{md5}63d7331e825c865a97b7a8d1299841ff'
    ==> rabbit: notice: Finished catalog run in 79.90 seconds

I’ll emphasize once more: this process will work with any configuration management system you apply.

But how do I know this works?  I can SSH into the machine and check processes, but I’d like to make sure it’s doing useful work — and shouldn’t we automate this checking?

Only one solution:  it’s time for testing!

ServerSpec

ServerSpec is a test framework for machines (whether virtual or bare metal).  It’s built on Ruby’s RSpec and it lets you write tests for files, processes, services and packages.  The tests are pretty easy to understand and write, even if you’ve never used Ruby before, and it’s smart enough to know how to run tests on remote machines or Vagrant instances.

You can set up ServerSpec and Rake (a Make-like utility for Ruby) like so:

$ gem install serverspec rake

(From here on, everything — the ServerSpec files, the Vagrantfile and the Puppet code — can be found in the “example_3” directory of the code repo.)

First tests

First I want to know whether we’ve actually installed the RabbitMQ package.  I start by setting up ServerSpec like so on the host, and answering the questions it asks me: it’s a Unix machine, it runs SSH, it’s a Vagrant instance, and we want ServerSpec to auto-configure from the Vagrantfile:

testing $ serverspec-init
Select OS type:
  1) UN*X
  2) Windows
Select number: 1
Select a backend type:
  1) SSH
  2) Exec (local)
Select number: 1
Vagrant instance y/n: y
Auto-configure Vagrant from Vagrantfile? y/n: y

It creates a skeleton test file in spec/rabbit/sample_spec.rb, which tests for a web server — the usual “Hello, world!” of configuration management. I renamed that to rabbitmq_spec.rb for clarity, and started out with my own test:

# Boilerplate required by ServerSpec
require 'spec_helper'

# And now a test!
describe package('rabbitmq-server') do
  it { should be_installed }
end

Pretty easy to understand — there’s a package, we want it installed.

At last, we can run the test with the “rake” command:

testing $ rake
  Package "rabbitmq-server"
    should be installed
Finished in 4.44 seconds (files took 0.27236 seconds to load)
1 example, 0 failures

This test passed!  Now let’s ensure the service is running, will start at boot time and is listening on the right ports.  I add this to rabbitmq_spec.rb:

describe service('rabbitmq-server') do
  it { should be_enabled   }
  it { should be_running   }
end

describe port(15672) do
  it { should be_listening }
end

describe port(5672) do
  it { should be_listening }
end

And here’s the output:

Package "rabbitmq-server"
  should be installed

  Service "rabbitmq-server"
    should be enabled
    should be running

  Port "15672"
    should be listening

  Port "5672"
    should be listening

Finished in 4.81 seconds (files took 0.26969 seconds to load)
5 examples, 0 failures

Looks good so far — but wouldn’t it be nice to test RabbitMQ?  Let’s exercise things a bit with a couple of Python scripts from RabbitMQ’s fine documentation:

describe command('/vagrant/send.py localhost') do
  its(:stdout) { should match /[x] Sent 'Hello World!'/ }end

describe command('/vagrant/receive_once.py localhost') do
  its(:stdout) { should match /Hello World!/ }
end

Run the tests again:

Package "rabbitmq-server"
should be installed

Service "rabbitmq-server"
should be enabled
should be running

Port "15672"
should be listening

Port "5672"
should be listening

Command "/vagrant/send.py localhost"
should return stdout /[x] Sent 'Hello World!'/

Command "/vagrant/receive_once.py localhost"
should return stdout /Hello World!/

Finished in 4.79 seconds (files took 0.26903 seconds to load)
7 examples, 0 failures

This was meant to be a slaved setup — and this is just one machine.  Can we test a distributed setup?

Version two: Multiple Vagrant machines (or, Vagrant and ServerSpec go to 11)

Let’s start by having two machines created by Vagrant.  Here’s our new Vagrantfile:

VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # Arbitrarily designate coney as slave; bring it up first.
  config.vm.define "coney" do |coney|
    coney.vm.box = "hashicorp/precise64"
    coney.vm.hostname = "coney.example.com"
    coney.vm.network "private_network", ip: "192.168.50.101"
    coney.vm.provision :puppet do |puppet|
      puppet.manifest_file = "init.pp"
      puppet.module_path = "~/puppet/modules"
      puppet.facter = { 'fqdn'  => coney.vm.hostname,
                        'flags' => 'rabbitmq_slave'
                      }
    end
  end

  # rabbit is master; bring it up last, once slaves are up.
  config.vm.define "rabbit" do |rabbit|
    rabbit.vm.box = "hashicorp/precise64"
    rabbit.vm.hostname = "rabbit.example.com"
    rabbit.vm.network :forwarded_port, guest: 5672, host: 5672
    rabbit.vm.network :forwarded_port, guest: 15672, host: 15672
    rabbit.vm.network "private_network", ip: "192.168.50.100"
    rabbit.vm.provision :puppet do |puppet|
      puppet.manifest_file = "init.pp"
      puppet.module_path = "~/puppet/modules"
      puppet.facter = { 'fqdn'  => rabbit.vm.hostname,
                        'flags' => 'rabbitmq_master'
                      }
    end
  end
end

There are a few things to note about this new file:

1. We’re configuring two machines here.

2. Each has a separate interface connected to each other; that’s what we’ll use for testing Rabbit.

3. We have to redo our ServerSpec setup.

Tests!

In order to test two machines, we’ll put our test in two different directories named after each machine: spec/coney/rabbitmq_spec.rb, and spec/rabbit/rabbitmq_spec.rb.  Each spec file will have the tests we showed above to make sure that its own, local copy of RabbitMQ is working.  We’ll also add a test on each node to make sure RabbitMQ has been clustered correctly with the other node:

describe command('/usr/sbin/rabbitmqctl cluster_status | grep running_nodes') do
  it { should return_stdout /running_nodes/ }
  it { should return_stdout /rabbit@coney/ }
  it { should return_stdout /rabbit@rabbit/ }
end

Finally, let’s send a message on one and see if it shows up on the other.  We’ll add this test to rabbit’s spec file:

describe command('/vagrant/send-mirrored.py') do
   it { should return_stdout /[x] Sent 'Hello Mirror World!'/ }
end

…and we’ll test reception by adding this on coney’s spec file:

describe command('/vagrant/receive_once-mirrored.py') do
  it { should return_stdout /Hello Mirror World!/ }
end

Okay, time to run!

coney
  Non-mirrored queue
    Rabbit
      Command "/vagrant/send.py -s rabbit"
        should return stdout /[x] Sent 'Hello World!'/
      Command "/vagrant/receive_once.py -s rabbit"
        should return stdout /Hello World!/
  Mirrored queue
    Command "/vagrant/send-mirrored.py"
      should return stdout /[x] Sent 'Hello Mirror World!'/

rabbit
  Non-mirrored queue
    Coney
      Command "/vagrant/send.py -s coney"
        should return stdout /[x] Sent 'Hello World!'/
      Command "/vagrant/receive_once.py -s coney"
        should return stdout /Hello World!/
  Mirrored queue
    should return stdout /Hello Mirror World'/

All the tests have passed, and now I can hand this to my coworkers.

What went well, what could be improved, and where do we want to take this?

First off:  this builds on the smart work of a lot of different projects: Vagrant, ServerSpec, and of course Puppet.  And if you haven’t read “Test-Driven Infrastructure with Chef”, you should; I’ve taken a lot of inspiration from that, and I can’t recommend it enough.

Second, this is a promising approach to a question that can seem intractable: how do you increase the velocity of an engineering team when you’re dealing with real machines?  Testing code lets you be sure that you’re doing what you expect, and you’re not doing anything unexpected (like breaking services).

Third, I’ve already started using this approach when dealing with tickets.  For example, recently I had to deal with a Varnish configuration that served blank pages when the back end 503’d, instead of serving from the cache as intended.  I was able to duplicate the problem, come up with a failing test, then pound on the configuration until the test passed.

However, it’s not perfect.  One reason is that we have a large Puppet code base. Bringing in random modules for testing is hard to do without, for example, changing sudoers (which breaks Vagrant in really fun ways). These modules were written under the (quite valid!)  assumption that they were for production use, and that they had a free hand in changing the machine as they saw fit.  Bolting testing on afterward is a tough job.

Finally, there is a lot of boilerplate in the Rspec tests; even something as simple as “these two machines should be able to talk over port 1234 via TCP” works out to a lot of tests, each of which need to be configured correctly.

Still…this is good, and a needed improvement in our work.  Regression tests, unit tests, integration tests, acceptance tests: they’re within reach.  We can do things better; on our best days we can do things not just correctly, but Right.

This post is categorized in: