Continuous Delivery, not Continuous Deployment

Engineering teams like Etsy’s have popularized the idea of continuous deployment: infrastructure that automatically rolls-out newly minted code to production in a safe and gradual manner. In Lean Startup and Web Operations, Eric Ries explains the rationale behind continuous deployment: making a safe habit out of shipping product. I loved the motive, but it was clear that the practice as described required heavy operations infrastructure:

  • A continuous deployment server for automatic deploys after successful continuous integration runs
  • Live monitoring to discern if a code change negatively affected business metrics
  • Automatic rollback if business metrics are negatively affected
  • Incremental rollouts to production servers so as not to deploy to all servers at once
  • Code architecture that allows for both old and new code to run in production simultaneously while code rollouts are in-progress
  • Feature switches

While leading a team at LivingSocial, I set out to achieve the goal of safe code shipping as a habit but without the complicated and time-costly infrastructure. We were successful by incorporating good software engineering and deployment practices–all of which were generally good for us and didn’t require as much dedicated tooling or time. Later we discovered others outside the company were starting to do the same under the label “continuous delivery.” We have been even more successful with continuous delivery at LearnZillion, where I am today.

Unfortunately, the cost of continuous deployment infrastructure can discourage engineering teams from investing time in their development and deployment process because they don’t realize the lower-cost alternative, continuous delivery, is also a viable option. I want to share how we do continuous delivery at LearnZillion, so that others can achieve similar results without the overhead of extra infrastructure.

0. Assumptions

I am going to assume the year is 2015, or even 2010 or 2006, and that you have a deployment script or tool like Capistrano to automate the basic deployment steps for your application. As well, I’m going to assume your team or organization wants to do continuous delivery. If neither of these are in-place, start with them.

1. Individual responsibility

Although we work as a team, individuals are responsible for carrying work forward to completion and doing that well. Staff are responsible for taking features from initial definition to production shipment. Along the way, they collaborate with and incorporate input from the broader team and company. (See Multipliers and Drive for reasons to give employees meaningful responsibility in the workplace.)

With these responsibilities come expectations:

Do not break the site. Do not break features. Do not break the test suite. Do not commit code you did not write (this is a smell of a bad development database on your machine, failed merge, etc.). Run the tests regularly–especially before merging into the master branch. If the master branch code changes in-between your test suite run and your re-attempt at commit, run the tests again after cleanup, as appropriate.

Unfortunately, I have found that in many organizations, lack of trust is the default. A tech lead or manager is responsible for scrutinizing code from all team members, merging, deploying, and ensuring the application won’t break. This may make sense for new team members until they understand and are comfortable with the team conventions and have demonstrated that they are capable engineers. Otherwise, it should not be the norm.

2. Smallest overlap of responsibilities

We often pair a product designer (design, UX, HTML/CSS) with a full-stack engineer (SQL, Rails, Ruby, JavaScript, HTML/CSS) to work on a feature. However, we avoid assigning multiple engineers the same feature. We try to keep engineers working on “orthogonal capabilities.” (See “The Three Musketeers” and The Mythical Man Month for the rationale behind this approach.)

3. The master branch is sacred

We deploy to production from our master branch. Developers can depend on master as a reliable foundation to fork, merge, and rebase from. Features are developed, reviewed, and QA-ed in separate branches. If you have test failures, it’s most likely your code. Feature branches are only merged into master immediately before deployment. It is the responsibility of the feature owner to make sure the branch is reasonably current with master before it is merged itself. There are loads of articles on “simple git workflow,” which you can find online, like this one. git and GitHub make this paradigm easy to follow.

4. Follow “The Twelve-Factor App” methodology

I will let the methodology speak for itself. See part X in particular. The biggest continuous delivery benefit is no surprises during deployment.

At LivingSocial, my team ensured the application development environment behaved like production, except where Rails intentionally separates the two. Truth be told, we didn’t have a reliable staging environment at our disposal, so we went straight from development to production. Believe it or not, because of our practices, this still worked quite well.

At LearnZillion, we take this further by using similar SaltStack configurations for production, staging, and a Vagrant-powered development environment. In development, the Ruby process and gems for the app are still installed on the host operating system but everything else runs inside VirtualBox. It has the side benefit of speeding-up the on-boarding process for new engineers.

5. A test suite

At both LivingSocial and LearnZillion, we used Ruby on Rails, which strongly encourages use of a unit testing framework. Engineers make certain to have a passing test suite before merging a branch into master, must have a passing test suite on master post merge, and a failure on the master branch takes top priority–second only to a live site outage.

At LearnZillion we took this farther by integrating CircleCI with GitHub to minimize the execution burden on engineers.

6. An automated QA test suite

At LearnZillion, we have a QA team. They naturally have the potential to be a bottleneck for getting features out. Since quality is their main objective, you want them to be gatekeepers. What you don’t want is for their review and gatekeeping processes to be cumbersome or inefficient. The most powerful lever you can maneuver within your QA team for continuous delivery is to automate their testing. Our team has an extensive QA test suite, which QA engineers can run against any branch, at any time, on a staging server. Automated tests are usually written soon after deployment to production, but sometimes are completed before then. Manual QA of emerging features still takes place, of course.

7. Look at your dashboards

It doesn’t take much effort to have a short list of links to Google Analytics, Mixpanel, or your error reporting service like Bugsnag or Honeybadger. An engineer can inspect them after deploy to see if something broke. Engineers and product designers should be doing this anyway to see how users are responding to changes or new features.

Bonus 1: Manual QA in a different time zone

When an engineer’s code has passed peer review and the automated QA test suite, it is sent along to QA for manual inspection. Test results are back by the next business morning because some of our QA team members are located in India. They test our work while we sleep.

Bonus 2: Continuous QA

At LearnZillion, we’ve integrated a GitHub pull request web hook that deploys a branch to a staging server and runs the QA test suite against it. This means that a branch has been regression tested before it gets to the QA team and usually before it gets to peer review. If you want to read more about our automated QA process, see Kevin Bell’s article about us over at CircleCI.

In Summary

With the good engineering and deployment practices of continuous delivery, you can achieve the same benefit of continuous deployment: safe, consistent delivery of product as a habit. You don’t have to build-out a dedicated infrastructure, and you can build a better engineering team and environment in the process.

Looking for your next gig?

If this sort of engineering environment is appealing to you, and you are interested in being a Senior Software Engineer or Senior Product Designer at LearnZillion, please apply. We would love to meet you.

[Thanks to my team for reviewing this post and recommending improvements to it.]

A Solution to the Stay-at-Home Mom Work History Gap?

My mom and I just came up with a novel way to solve the work history “gap” that seemingly endangers stay-at-home moms’ résumés when they return to the workforce–especially those who have been at home awhile.

Traditional options for addressing the gap, although well-intended and somewhat helpful, either dodge the at-home period by suggesting a functional résumé format over a chronological one, or, at-best, infer that the time is not worth citing but should be glossed over. These approaches identify and define the time at home truly as a gap in one’s work history. This is simply not true for many women. It’s work!

Former stay-at-home moms, or those in-transition, here is an idea for you: instead of leaving a gap in your chronological résumé, list the mom position like you would any other job in the workplace. Give a high-level summary of your role, non-obvious responsibilities, and list your children with their professional accomplishments as you have most certainly helped them achieve them.

Here is an example of what my mom could put on her résumé:


Stay-at-Home Mom, Home School Teacher, Life Coach (1981-2015)
Raised five children full-time, providing homeschooling to each child K-8th grade. Helped my children discover their interests and gifting and provided counsel, advice, and support as-needed.

  1. Ian Lotinsky: VP of Engineering at LearnZillion
  2. Adam Lotinsky: Project Manager at JFW
  3. Lauren Pucciarelli: Commercial CRM Auditor at Architectural Ceramics
  4. Aaron Lotinsky: Project Manager and Fulfillment at Decorative Films
  5. Nate Lotinsky: Junior, Electrical Engineering at Montgomery College

Now, “Life Coach” is intended to be slightly humorous, but is, in-fact, entirely true. 5 out of 5 kids on solid professional trajectories. That’s a parenting accomplishment if you ask me.

What do you think of this idea? I want to hear from moms and hiring managers.

How Well Do You Treat Your Sysadmin/DevOps/Ops Engineer?

Let’s be honest, systems administration, whether working with bare metal or in the cloud, is often worse than a thankless job. If the site is up and running, you’ll get no thanks. If it goes down, you better get it back up quickly…and then explain what just broke. If you need to schedule downtime, well, you have to schedule that for 4 AM on a Saturday and still show up chipper on Monday.

I’ve seen too many ops engineers work themselves to the bone fire-fighting, scaling, and migrating the foundation on which entire businesses stand as if in a full-on marathon…sprinting. They get no chance to breath, normalcy, or arrive at the autonomy or purpose we all seek to earn our work.

Not on my watch. Here are the practices we employ at LearnZillion to make sure our environment is a livable, enjoyable, and rewarding place to be an ops engineer.

We maintain a sane software engineer to ops engineer ratio. I recently talked with an ops engineer who was responsible for the systems behind the company’s 60-person software engineering team. I wish this was an extreme situation or at the least sustainable, but it’s not. This isn’t the first time I’ve heard it either. Whether the software engineers are great or sucky, you’re in for a rough ride when the ratio is stacked against you. Don’t let this happen. Systems take serious work to build and maintain. Don’t ever let an employee drown in work.

We deploy during working hours whenever possible. Our engineering team practices no-downtime, continuous delivery within a time window that allows for issues to shake out before staff go home for the day or weekend. We typically ship Monday through Thursday 8 AM to 3 PM. If a completely shippable deliverable misses that window, we often wait until the next reasonable workday to deploy. We don’t want anyone, in software or in ops, paged while out of the office. It’s a terrible way to live. Strive to keep work at work.

We have reasonable maintenance windows. It took a bit of Google Analytics investigation and some convincing inside the company, but our maintenance window starts at 8 PM EST when we need one. Will this affect users? Yes. Is this the ideal time for users? No. Do we want to save our ops engineers from burnout, sleep deprivation, and insanity, and allow them to live life? Yes! Since we practice continuous delivery, maintenance that requires our site to be offline is rare, so it’s a reasonable trade-off.

We assume it’s a software issue until ops is proven guilty. Too many people outside an engineering department or even insufficiently experienced software engineers assume the computers are to blame when things go down (guilty!). Operations issues happen, but software change or software engineering flubs are usually at fault. We make sure our issue escalation process assumes this reality. Our ops engineer is our last line of defense, not our first.

We make space for proactive ops engineering. Imagine you’re in a sinking ship and you’re told to keep bailing water, even though there’s a plug and hammer at your feet that will stop a source of the leaking. That’s what it’s like to be deprived of space to make your work life better. Nowadays, software engineers are given space to pay-off tech debt. Not only does this make it easier for them to ship features in the long run, it also makes their working environment less toxic. Help your ops engineers make time for proactive work. Tell your software engineers to endure that less important but painful pain they’re complaining about just a little longer so that ops gets the space it needs to address the top issues on its list too.

We check-in regularly. Ops engineers are a part of our standard kick-off meetings and stand-ups. They have an equal voice at the table. They serve the needs of the business like the rest of us, but they are not subservient. We connect out-of-band to see how things are going too.

We pay them competitively. We send them to meetups, conferences, and training just like software engineers. We let them go to the dentist when they need to. We praise them for their work. We treat them well. Do you?