Continuous Delivery/Release – a Basic Howto with Examples

When we first went to build a setup with Continuous Delivery (CD) as the goal, we found plenty of excellent theoretical fodder but little in the way of specifics. As with all the hubbub about the cloud, we wondered how to actually get there let alone how best to get there. Here is a rundown of the basics.

Let’s be clear about our goal: checking in code and having it automatically update your production service in a very short timeframe.

Continuous Integration
The first step is automation. While it’s always nice, here it’s all or nothing. If any part of it is prone to breaking or requires manual intervention, delivery will not be continuous. If you’re not already here, you should be. Your systems should be like an ipod when you’re through with them, with a just a few buttons for everything you do more than twice. See our post on Chef and CloudFormation for more details. For the sake of this guide, let’s assume you have Opscode Chef up and running with a Chef server directing them. At this point, machines should be able to come up on their own, grab everything they need and start serving within minutes. Chef 0.10+ supports “environments,” which is what we’ll use to tell machines what version of the codebase they are to use. Here’s a sample chef-repo/environments/qa.json:

{
    "chef_type": "environment",
    "json_class": "Chef::Environment",
    "name": "qa",
    "description": "",
    "default_attributes": {
      "myfrontendapp_revision" : "0fe30e04e8aa610c2e5a34a75b924c2462f87d4e"
    },
    "cookbook_versions": {
      "mycookbook": "0.1.9"
    }
}

When we upload this config to the Chef server, myfrontendapp_revision becomes an “attribute” on every machine in environment qa, meaning chef-client has access to it on each node. For simplicity, we will also assume for now that what we’re releasing straight from github. The simplicity derives from Chef’s built-in support for this by way of their “deploy_revision” resource, the recipe for which should contain something like this in the case of Amazon Linux:

cookbook_file "/home/ec2-user/.ssh/deploy-id_rsa.pub" do
  source "ssh/id_rsa.pub"
  mode 0600
  owner "ec2-user"
  group "ec2-user"
end

cookbook_file "/home/ec2-user/bin/wrap-ssh4git.sh" do
  source "bin/wrap-ssh4git.sh"
  owner "root"
  group "root"
  mode 0755
end

deploy_revision "myfrontendapp" do
  repo "git@github.com:mygitaccount/myfrontendapp.git"
  user "ec2-user"
  revision node['myfrontendapp_revision']
  deploy_to /var/my/dir/for/release
  ssh_wrapper "/home/ec2-user/bin/wrap-ssh4git.sh"
  action :deploy
end

Chef’s idempotency ensures that if the revision doesn’t change, nothing happens, and that if it does, that revision gets released. This way Chef can safely run continuously. Change the revision on the chef server to the name of a branch in git (e.g. master), and you’re done.

    "default_attributes": {
      "myfrontendapp_revision" : "master"
    },

Every time chef-client runs on the machine, it will check to see if the branch has been updated. If changes have been pushed, it releases the branch. Congratulations. You now have continuous integration. If you don’t mind your code being released every few minutes, broken or not, then change qa to prod and you have continuous delivery, straight to the user. Clearly, there are some steps in between continuous integration and continuous delivery.

Continuous Delivery/Release
There’s no way to release production code in an automated fashion unless the testing is automated as well. There are any number of solutions out there. To keep this very basic in favor of focusing on the big picture, let’s say that there’s a script that runs every night that will test the qa environment. At the end of this script, it either passes or it fails. If it fails, a notification is triggered, and it tries again in a bit. If it passes, a second script is invoked which modifies attribute myfrontendapp_revision for environment prod, uploads it to the chef server, then kicks chef-client on all nodes running myfrontendapp. A very basic, POC script looks like this, assuming knife.rb is configured for the user and they have sudo access to all the machines (update_att.rb):

require 'rubygems'
require 'bundler/setup'
require 'chef'

require 'net/ssh'
require 'net/ssh/multi'
require 'readline'
require 'chef/search/query'
require 'chef/mixin/shell_out'
require 'chef/knife/ssh'

environment = ARGV[0]
default_attribute_key = ARGV[1]
default_attribute_value = ARGV[2]

# get env config in json from the chef server
Chef::Config.from_file('/home/me/.chef/knife.rb')
rest = Chef::REST.new(Chef::Config[:chef_server_url])
env_chef = rest.get_rest("/environments/" + environment)
env_json = env_chef.to_json(env_chef)
env = JSON.parse(env_json, :create_additions => false)

# change the revision
env['default_attributes'][default_attribute_key] = default_attribute_value
env_json_new = env.to_json

# write a new json file for source control
File.open(Chef::Config[:cookbook_path].to_s + '/../environments/' + environment + '.json', 'w') {|f| f.write(env_json_new) }

# upload the new environment to the chef server
env_chef = rest.put_rest("/environments/" + environment, env)

# kick chef-client for all machines in environment
Chef::Config.from_file('/home/me/.chef/knife.rb')
ssh =  Chef::Knife::Ssh.new
ssh.config[:attribute] = 'ec2.public_hostname'
#ssh.config[:ssh_user] = username
ssh.name_args << 'chef_environment:' + environment
ssh.name_args << 'sudo su - -c chef-client'
ssh.run

% ruby ./update_att.rb prod myfrontendapp_revision SOME_NEW_REVISION

Congratulations for real. This is the most basic form of Continuous Delivery. All you do is check in code. Your QA script will test and release it however often you choose with no intervention.

Beyond the Basics
This example is far too basic for most production environments. If you're starting from scratch, it's a great way to start, but parts of your code that aren't front-end scripts will likely require compilation (also, relying on github over, say, s3, isn’t the most robust solution). The structure remains the same. In the case of Java, for example, you can easily have Jenkins build binaries, upload them to s3 and tag them with, say, their md5 hashes. These hashes can replace github's revision in the example, and you can write your own recipes to handle s3 retrieval of binaries built by Jenkins. For example, using the s3_file resource for chef:

s3_file "#{node['root_myfrontendapp']}/releases/myfrontendapp.war-" + node[:myfrontendapp_revision] do
  remote_path "/myfrontendapp/myfrontendapp.war-" + node[:myfrontendapp_revision]
  bucket "#{node['deploy_bucket']}"
  aws_access_key_id "#{node['deploy_user']}"
  aws_secret_access_key "#{node['deploy_pass']}"
  action :create
end

directory "root-webapp" do
  recursive true
  path "#{node['tomcat_webapp_base']}/ROOT"
  action :nothing
end

# because s3_file doesn't support notifications, among other reasons                                                                                                                                                                  
link "#{node['tomcat_webapp_base']}/ROOT.war" do
  to "#{node['root_myfrontendapp']}/releases/myfrontendapp.war-" + node[:myfrontendapp_revision]
  # just to be sure that the webapp dir is wiped
  notifies :stop, 'service[tomcat6]', :immediately
  notifies :delete, 'directory[root-webapp]', :immediately
  notifies :restart, 'service[tomcat6]', :delayed
end

If you have Jenkins build an RPM instead, just make myfrontendapp_revision the rpm version number and the chef resources reduce to one: package.

Your QA apparatus should be robust and cover as fully as possible anything you yourself might test on a push to production. Our rule of automation is: if you've done it twice and expect to do it again, automate it. The best way to most comprehensively address this is by implementing Test-Driven Development (TDD) from the beginning. Like most awesome things these days, it requires no small amount of overhead, but you will get results that no amount of after-the-fact QA guesswork can achieve. It's a judgement call whether or not you start with TDD.

Forecast: Cloudy
The qa environment does not properly emulate your real production environment, you say? That's what staging is for, you say? This is where the cloud shines. Not only can your nightly release happen automatically, but you can leave the build machines off until it's time to build, saving money. You can also autoscale them out, saving time. You can even have them, on successful QA test of the qa environment, launch a staging stack and hammer away at it before releasing to production. If your automation was done right, one API call to something like AWS's CloudFormation should do the trick. Once the test passes in stage, prod can be changed and updated while you sleep.

To avoid downtime during release, front and middle-tiers should have at least two machines each. Have your script run each sequentially, reverting on failure. Your back-end jobs should be thoroughly decoupled using some measure such as queues. If they are, they should be able to stop at any time and pick up where they left off when they’re ready, autoscaling to accommodate backed-up queues.

What about dastardly things like updates to a SQL schema? You had better be damn sure such a change won’t cause problems. Begin by automating in anticipation of one day having it fully automated, and reduce the manual portion to a single button. Iron out your process. When things stop going wrong, have a program push the button.

Good luck!