Engineering teams like Etsy’s have popularized the idea of continuous deployment: infrastructure that automatically rolls-out newly minted code to production in a safe and gradual manner. In Lean Startup and Web Operations, Eric Ries explains the rationale behind continuous deployment: making a safe habit out of shipping product. I loved the motive, but it was clear that the practice as described required heavy operations infrastructure:
- A continuous deployment server for automatic deploys after successful continuous integration runs
- Live monitoring to discern if a code change negatively affected business metrics
- Automatic rollback if business metrics are negatively affected
- Incremental rollouts to production servers so as not to deploy to all servers at once
- Code architecture that allows for both old and new code to run in production simultaneously while code rollouts are in-progress
- Feature switches
While leading a team at LivingSocial, I set out to achieve the goal of safe code shipping as a habit but without the complicated and time-costly infrastructure. We were successful by incorporating good software engineering and deployment practices–all of which were generally good for us and didn’t require as much dedicated tooling or time. Later we discovered others outside the company were starting to do the same under the label “continuous delivery.” We have been even more successful with continuous delivery at LearnZillion, where I am today.
Unfortunately, the cost of continuous deployment infrastructure can discourage engineering teams from investing time in their development and deployment process because they don’t realize the lower-cost alternative, continuous delivery, is also a viable option. I want to share how we do continuous delivery at LearnZillion, so that others can achieve similar results without the overhead of extra infrastructure.
I am going to assume the year is 2015, or even 2010 or 2006, and that you have a deployment script or tool like Capistrano to automate the basic deployment steps for your application. As well, I’m going to assume your team or organization wants to do continuous delivery. If neither of these are in-place, start with them.
1. Individual responsibility
Although we work as a team, individuals are responsible for carrying work forward to completion and doing that well. Staff are responsible for taking features from initial definition to production shipment. Along the way, they collaborate with and incorporate input from the broader team and company. (See Multipliers and Drive for reasons to give employees meaningful responsibility in the workplace.)
With these responsibilities come expectations:
Do not break the site. Do not break features. Do not break the test suite. Do not commit code you did not write (this is a smell of a bad development database on your machine, failed merge, etc.). Run the tests regularly–especially before merging into the master branch. If the master branch code changes in-between your test suite run and your re-attempt at commit, run the tests again after cleanup, as appropriate.
Unfortunately, I have found that in many organizations, lack of trust is the default. A tech lead or manager is responsible for scrutinizing code from all team members, merging, deploying, and ensuring the application won’t break. This may make sense for new team members until they understand and are comfortable with the team conventions and have demonstrated that they are capable engineers. Otherwise, it should not be the norm.
2. Smallest overlap of responsibilities
3. The master branch is sacred
We deploy to production from our master branch. Developers can depend on master as a reliable foundation to fork, merge, and rebase from. Features are developed, reviewed, and QA-ed in separate branches. If you have test failures, it’s most likely your code. Feature branches are only merged into master immediately before deployment. It is the responsibility of the feature owner to make sure the branch is reasonably current with master before it is merged itself. There are loads of articles on “simple git workflow,” which you can find online, like this one. git and GitHub make this paradigm easy to follow.
4. Follow “The Twelve-Factor App” methodology
I will let the methodology speak for itself. See part X in particular. The biggest continuous delivery benefit is no surprises during deployment.
At LivingSocial, my team ensured the application development environment behaved like production, except where Rails intentionally separates the two. Truth be told, we didn’t have a reliable staging environment at our disposal, so we went straight from development to production. Believe it or not, because of our practices, this still worked quite well.
At LearnZillion, we take this further by using similar SaltStack configurations for production, staging, and a Vagrant-powered development environment. In development, the Ruby process and gems for the app are still installed on the host operating system but everything else runs inside VirtualBox. It has the side benefit of speeding-up the on-boarding process for new engineers.
5. A test suite
At both LivingSocial and LearnZillion, we used Ruby on Rails, which strongly encourages use of a unit testing framework. Engineers make certain to have a passing test suite before merging a branch into master, must have a passing test suite on master post merge, and a failure on the master branch takes top priority–second only to a live site outage.
At LearnZillion we took this farther by integrating CircleCI with GitHub to minimize the execution burden on engineers.
6. An automated QA test suite
At LearnZillion, we have a QA team. They naturally have the potential to be a bottleneck for getting features out. Since quality is their main objective, you want them to be gatekeepers. What you don’t want is for their review and gatekeeping processes to be cumbersome or inefficient. The most powerful lever you can maneuver within your QA team for continuous delivery is to automate their testing. Our team has an extensive QA test suite, which QA engineers can run against any branch, at any time, on a staging server. Automated tests are usually written soon after deployment to production, but sometimes are completed before then. Manual QA of emerging features still takes place, of course.
7. Look at your dashboards
It doesn’t take much effort to have a short list of links to Google Analytics, Mixpanel, or your error reporting service like Bugsnag or Honeybadger. An engineer can inspect them after deploy to see if something broke. Engineers and product designers should be doing this anyway to see how users are responding to changes or new features.
Bonus 1: Manual QA in a different time zone
When an engineer’s code has passed peer review and the automated QA test suite, it is sent along to QA for manual inspection. Test results are back by the next business morning because some of our QA team members are located in India. They test our work while we sleep.
Bonus 2: Continuous QA
At LearnZillion, we’ve integrated a GitHub pull request web hook that deploys a branch to a staging server and runs the QA test suite against it. This means that a branch has been regression tested before it gets to the QA team and usually before it gets to peer review. If you want to read more about our automated QA process, see Kevin Bell’s article about us over at CircleCI.
With the good engineering and deployment practices of continuous delivery, you can achieve the same benefit of continuous deployment: safe, consistent delivery of product as a habit. You don’t have to build-out a dedicated infrastructure, and you can build a better engineering team and environment in the process.
Looking for your next gig?
If this sort of engineering environment is appealing to you, and you are interested in being a Senior Software Engineer or Senior Product Designer at LearnZillion, please apply. We would love to meet you.
[Thanks to my team for reviewing this post and recommending improvements to it.]