DevOps has been a major cultural force in IT for the past ten years. But a gap remains between what companies expect to get out of DevOps and the day-to-day realities of working as a systems engineer on a IT team.
Systems engineers often spend too much time putting out fires and manually building, configuring, and maintaining cloud infrastructure. A recent survey found that it takes more than a month to deliver new infrastructure for 33 percent of companies, and more than half had no access to self-service infrastructure. The result is that systems engineers burn out quickly, developers are frustrated, and new projects are delayed. Add to the mix a constantly shifting regulatory landscape and dozens of new platforms and tools to support, and chances are that your operations team is pretty overwhelmed.
So how should we build and manage new cloud systems? As systems engineers, which principles should we live by?
Below, the engineering team at Logicworks has tried to encapsulate all our cloud management best practices into a Cloud Operations Manifesto — a guide for any system engineer on how to operate in the cloud. Obviously, our manifesto is strongly influenced by the Agile Manfesto and core DevOps principles. You can download a PDF of the Manifesto here. Think we’ve left something out? We’d love to hear about it in comments.
A Cloud Operations Manifesto
1. End-user experience is our responsibility.
Uptime and high performance should be constantly measured and optimized.
2. Infrastructure is ephemeral, immutable, and utilitarian.
Don’t fix it, replace it.
3. Infrastructure is software.
It’s programmable, modular, and ideally deployed as part of the application. Everything can change quickly, is tested frequently, and is ready to scale.
4. Make data useful.
Log everything centrally, ingest logs into a BI tool, and get data that benefits the business.
5. Let automation do the boring stuff.
Patching, backups, key rotation etc. should be automated.
6. Security first.
Think about security in the beginning of the architecture process, not as a last-minute add-on.
7. Understand the difference between security and compliance.
Security reduces operational risk. Compliance reduces business risk. You need both.
8. Anticipate failure.
Expect everything to fail, plan for failure, and deliberately break things to test failover.
9. Have a Runbook.
No matter how sophisticated your technology, you still need a response plan for issues, especially after-hours.
10. Don’t be afraid to fail quickly.
Not every cloud technology is going to be right for you, but you won’t know unless you stay current and test.