While it is possible to deploy a highly available Amazon Web Services (AWS) environment without configuration management tools like Puppet, doing so significantly increases the risk of error and downtime.
AWS provides many tools for managing the configuration and deployment of your AWS resources, such as CloudFormation and ElasticBeanstalk, but these tools only cover the AWS objects you create, and don’t manage the software and other configuration data present on your EC2 instances.
A DevOps team relies on configuration management to maintain a single source of consistent, documented system configuration. As enterprise infrastructure becomes code and instances can be spun up or down with a few clicks, the protection this affords is absolutely essential for complex deployments. An EC2-resident configuration tool like Puppet is also the key configuration engine during Auto Scaling events, provides version control in a variety of ways, and has monitoring and reporting capacities, along with other benefits.
It is often tempting to not bother with Puppet or configuration management in an initial AWS set-up, especially if the team is new to using Puppet. The benefit of doing the hard work upfront is that every consumer-facing application changes. The more that is automated, the more time engineers can spend on new projects and the quicker the team can adapt to change.
The fundamental purpose of Puppet, of course, is to deploy environments prescriptively. By defining everything on the servers in a single location, there is a single source of truth about the state of the entire infrastructure. On any kind of infrastructure, automated software installs can save your team a great deal of time, but in a dynamic cloud environment, configuration management tools can actually create the AWS resources your nodes require to operate, such as Elastic IPs, network interfaces, or block storage. When using Puppet on nodes in AWS autoscaling groups, however, a few different issues can arise.
One of the most critical aspects of any Puppet implementation is having stable and unique hostnames for each instance. The hostname, in conjunction with the client SSL certificate, is the primary means for associating configuration classes with a node, and is also used to uniquely identify each node in your infrastructure.
Of equally large importance are the SSL certificates used to encrypt communication with the Puppetmaster. During an autoscale event, such as instance failure, it’s not enough for an instance to have the same hostname as before. An instance with the correct hostname but a different or missing certificate will be denied updates by the Puppetmaster, so it’s important to plan for this as well.
Finally, if your autoscaling group is actively scaling in response to load or some other metric, you’ll need to ensure that new nodes will have their certificates signed by the Puppetmaster as soon as they connect, without waiting for manual intervention. By configuring a set of conventional hostnames, your autosign.conf file can be implemented using wildcard expressions, ensuring new nodes will be signed and immediately receive their configuration on a first run attempt.
Puppet and Auto Scaling
Puppet scripts can replace the need for a perfectly baked AMI. Instead, you can create a vanilla template with the minimum possible configurations that replaces your individual Golden Masters for each server role. In this scenario, your instance userdata or boot script needs only to do what is necessary to connect to the Puppetmaster.
One challenge is that it may take upwards of several minutes for Puppet to run configurations and for the instance to take on its assigned roles. This can be an issue if you are experiencing load and need to quickly scale up your infrastructure. The solution for this is to create snapshot AMIs of already configured instances, and update your autoscaling groups to use those AMIs on startup.
This process, too, can be automated by using the autoami Puppet module. This module will watch your EBS-backed instances to detect changes made by the Puppet agent, and automatically create AMIs that can be used to scale up or replace new instances during autoscale events.
Puppet and Version Control
Version control is well-supported by the Puppet ecosystem. It is possible to use a mature Software Development Lifecycle to manage the development and maintenance of your Puppet manifests, tightly integrated with a branch-based workflow that truly realizes the ideals of “infrastructure-as-code.”
In the latest versions of Puppet, environments are simply different directories of modules on your Puppetmaster. Each environment will only see the version of a module that’s in its directory, so you can simply check out the corresponding branch of your custom module in each directory. Combined with an automatic update of the Puppetmaster’s modules in response to Git hooks, engineers can try out the latest version of a manifest in a testing environment first. Once it is ready to push to staging or prod, the manifest in the staging folder is merged directly over to your production or master branch.
Puppet also helps maintain the correct configuration even if it means reverting back to an earlier version. You describe what the environment should look like – this directory has to have these permissions, these versions, this Apache virtual host for a website, this alias, etc. – and if the configurations are wrong it will change them back to the correct permissions. For example, one could write a custom fact to report whether or not each instance Puppet knows about is running the correct version of a database or in in the correct security group. You can programmatically discover all of the systems that are not registered with a certain security group and automatically register the instance if it is not.
In complex environments with dozens of daily deploys, this can significantly lower the risk of downtime associated with faulty deploys. These authoritative versioning controls alone make Puppet worth the effort of mastering.
Puppet and Reporting: Future Possibilities
Puppet has the ability to pull and generate fairly powerful reports from every node it knows about. A core tool called Facter comes with default modules to describe hundreds of facts about your system, and you can create your own custom facts to describe anything from the current released Git revision of your application, the current installed version of OpenSSL, or the values of CloudFormation parameters passed to the stack that created your instance.
Another useful technique is to use Puppet’s dry-run feature to disable all updates in your production environment. The agent will still run and look for changes that need to be made, but instead of applying them, it will merely log what it would have normally done. This can be used to queue up changes for maintenance periods, allowing change management workflows to be kept intact.
Foreman, an open source project that uses Puppet’s smart proxy to help orchestrate and monitor instances, also provides a robust dashboard for monitoring your hosts’ reports and status. This makes it easier to review historical changes for troubleshooting and see the host list at a glance to see if a few instances are out of sync. The rich data available here holds many possibilities and in the future, one could imagine that these facts could be further automated or be used to trigger custom events.
Puppet isn’t the only CM tool available, but it is the most mature, with a large community of active module developers. If a team has more experience in another language, however, something like Chef, Salt, or Ansible might be a better choice.
Configuration management is a relatively new venture for AWS, by way of their OpsWorks platform. Although supporting Chef natively, the tool is so far relatively limited in scope. Also, it doesn’t yet support some of the most common Chef concepts like data bags, thus making the increasingly large Chef community less of an asset than it could be. Like most tools at AWS, it bears watching as it develops over the next few years, but many configuration management power-users will still prefer using a standalone stack.