An independent company that provides backup, business continuity, and data analytics software and services to a large number of the Fortune 500 has migrated a new project to Amazon Web Services (AWS). The project continues the company’s staged migration to AWS, but is also the first internal test case of their preferred infrastructure model: immutable architecture.
Due to the complexity and frequency of the company’s development process, their development team wanted to be able to terminate instances with outdated code, rather than update them, so that every code push is a rebuild of the entire stack. This is the true meaning of immutable infrastructure: AWS instances are disposable and once you instantiate the infrastructure and code, you never change the instance. This way the infrastructure never strays from its initial “known-good” state, operations is simplified, and failure is a routine and continuous way of doing business.
Immutable Infrastructure design requires full automation of the environment from AWS resource provisioning to bootstrapping, package installation, and code deployment, which is why true immutable infrastructure is so rare — particularly in enterprise IT environments. To achieve this, the company hired Logicworks, an AWS Premier and Managed Service Partner with DevOps Competency, to automate AWS resource provisioning and manage ongoing operational tasks so that their team could focus on software development. Their goal was to release their team from the “ball and chain” of maintaining infrastructure, both through automation and the outsourcing of any traditional “ops” function to Logicworks.
To achieve this, the Logicworks Sr. Engineering team first built a custom CloudFormation template that performs standard tasks like building and configuring a VPC and access controls. Then the Puppet agent is installed and connects to the Puppetmaster, which then configures the OS of the instance. The Logicworks team then created an AMI from this fully-configured instance, so that the only items that get installed upon creation are host names and other minor details. This image is continually improved and tested by the Logicworks team (including a recent improvement to include LDAP in the image rather than on boot) and due to the nature of their system, they know that any new version of the AMI will be on every single instance within hours. The final step in the Puppet process is kicking off their deploy script, and this “hand off” required the most delicate work. This deploy script then pulls down the most recent version of code from an on-premises box. An identical process works in their production environment, though instances are replaced less frequently.
The project has a custom deployment pipeline with 20+ parallel automated and manual tests, each in a separate development environment. When a new deploy to the development environment occurs, a new set of 20+ dev instances must be created that has the new version of code; when the test is finished, the instance must be terminated. This means that no instance in project’s environment lasts for longer than 24 hours, and over the course of a single week, hundreds of instances are terminated and rebuilt.
This degree of reliably requires significantly more effort than simply placing the instances in Auto Scaling Groups (ASGs), and requires custom tooling with AWS CloudFormation and Puppet that is closely integrated with the company’s internal configuration management tooling and custom deployment script. The Logicworks and internal team collaborated over the course of several months to achieve this integration, spinning up thousands of instances per week to test the process.
This system has a significant impact on the company’s overall security profile. Their risk of configuration error is drastically reduced, as the system is rebuilt from an “ideal” state every few hours. To mitigate even the small risk of configuration error or malicious attack, the company’s instances are connected to a central Logicworks hub with custom “scanners” that alert or proactively change instances whose configuration does not match the current standard, such as scripts that ensure AWS Config and CloudTrail are enabled, ensuring MFA on the root account, checking for upcoming EC2 restart events, and even configuring snapshots of EBS volumes based on tagging with customizable retention. These scanners are currently written in Python but the Logicworks team is in the process of converting them into AWS Lambda functions, so that the underlying instances supporting this hub no longer need to be managed.
As a result of AWS resources and Logicworks DevOps expertise, the company has created an immutable architecture with a 0.001% instance failure rate and 100% uptime for their production application, even during very rare AWS outages. Their developers deploy to their development environment with a single click, without any instance configuration tasks, at which point they know that every testing and production environment is configured to their standard and has no residual impacts from previous failed or passed tests. This represents a 60% higher deployment efficiency over similar projects on AWS.
The success of this project demonstrated the tremendous power of disposable AWS resources, and as a result, AWS is the new standard infrastructure platform. The company plans to move an additional 6-12 projects into AWS in the next 6 months, with a conservative monthly AWS spend of $250K.
Client: mattis eu varius
Tags: SaaS, SecurityView Project