Let’s be honest, systems administration, whether working with bare metal or in the cloud, is often worse than a thankless job. If the site is up and running, you’ll get no thanks. If it goes down, you better get it back up quickly…and then explain what just broke. If you need to schedule downtime, well, you have to schedule that for 4 AM on a Saturday and still show up chipper on Monday.
I’ve seen too many ops engineers work themselves to the bone fire-fighting, scaling, and migrating the foundation on which entire businesses stand as if in a full-on marathon…sprinting. They get no chance to breath, normalcy, or arrive at the autonomy or purpose we all seek to earn our work.
Not on my watch. Here are the practices we employ at LearnZillion to make sure our environment is a livable, enjoyable, and rewarding place to be an ops engineer.
We maintain a sane software engineer to ops engineer ratio. I recently talked with an ops engineer who was responsible for the systems behind the company’s 60-person software engineering team. I wish this was an extreme situation or at the least sustainable, but it’s not. This isn’t the first time I’ve heard it either. Whether the software engineers are great or sucky, you’re in for a rough ride when the ratio is stacked against you. Don’t let this happen. Systems take serious work to build and maintain. Don’t ever let an employee drown in work.
We deploy during working hours whenever possible. Our engineering team practices no-downtime, continuous delivery within a time window that allows for issues to shake out before staff go home for the day or weekend. We typically ship Monday through Thursday 8 AM to 3 PM. If a completely shippable deliverable misses that window, we often wait until the next reasonable workday to deploy. We don’t want anyone, in software or in ops, paged while out of the office. It’s a terrible way to live. Strive to keep work at work.
We have reasonable maintenance windows. It took a bit of Google Analytics investigation and some convincing inside the company, but our maintenance window starts at 8 PM EST when we need one. Will this affect users? Yes. Is this the ideal time for users? No. Do we want to save our ops engineers from burnout, sleep deprivation, and insanity, and allow them to live life? Yes! Since we practice continuous delivery, maintenance that requires our site to be offline is rare, so it’s a reasonable trade-off.
We assume it’s a software issue until ops is proven guilty. Too many people outside an engineering department or even insufficiently experienced software engineers assume the computers are to blame when things go down (guilty!). Operations issues happen, but software change or software engineering flubs are usually at fault. We make sure our issue escalation process assumes this reality. Our ops engineer is our last line of defense, not our first.
We make space for proactive ops engineering. Imagine you’re in a sinking ship and you’re told to keep bailing water, even though there’s a plug and hammer at your feet that will stop a source of the leaking. That’s what it’s like to be deprived of space to make your work life better. Nowadays, software engineers are given space to pay-off tech debt. Not only does this make it easier for them to ship features in the long run, it also makes their working environment less toxic. Help your ops engineers make time for proactive work. Tell your software engineers to endure that less important but painful pain they’re complaining about just a little longer so that ops gets the space it needs to address the top issues on its list too.
We check-in regularly. Ops engineers are a part of our standard kick-off meetings and stand-ups. They have an equal voice at the table. They serve the needs of the business like the rest of us, but they are not subservient. We connect out-of-band to see how things are going too.
We pay them competitively. We send them to meetups, conferences, and training just like software engineers. We let them go to the dentist when they need to. We praise them for their work. We treat them well. Do you?