Nuvole Computing » CloudFormation

SeeDub: Universal Custom Metrics for AWS CloudWatch

netbas — Sun, 25 Mar 2012 19:21:44 +0000

When AWS supported an API call to push whatever metric we want, we got very excited. Once a metric is in CloudWatch, we can key a scaling policy to it, have it trigger an action, an alarm, basically perform any action we very well please. Finally, ultimate flexibility in what factors determine what actions, and we get pretty graphs to boot! For example, we could scale a processing group up and down like an accordion based on the number of entries in a DB table, or maybe the amount of free system memory. The hurdles proved disheartening, but we created a solution.

Take for instance API throttling, the shadowy world where AWS can never give a straight answer. Actually, they can, but you have to escalate. All the API endpoints have limits on them, per-customer. You can raise them, but your code has to be able to throttle itself or you risk crazy race conditions everywhere, especially when you’re talking about thousands of machines all pushing their metrics via, say, cron every five minutes.

The CloudWatch team, under NDA, provided me with a development library in perl that would handle API retries, but we wanted one solution we could deploy everywhere and not worry. So we wrote SeeDub, an intermediary that takes simple files and queues them up for batch processing by the Amazon::CloudWatchClient lib, which will handle all the retries. Built-in is some randomness for an offset to make sure thousands of machines aren’t firing all at once. Write a file into /var/nuvole/seedub.d/NAMESPACE/unique_name_for_metric_file, and let bin/putmetricdata.pl take care of CloudWatch, e.g.

$ cat > /var/nuvole/seedub.d/Nuvole/SpecialNamespace/whatever.random1384y28237 < name Crazy value 15 unit Count time 1314639962 dimensions Partition=/ EOF $ bin/putmetricdata.pl us-east-1

Any app, any system tool, any piddly script that can write four simple lines has direct, robust and resilient access to the CloudWatch API regardless of your limit.

And recently, the CloudWatch team has released an updated version of the code, which seems perfectly compatible. And so we're releasing our part. https://github.com/netbas/SeeDub/

For those who want to see it in action on their own machines immediately, create a stack using the SeeDub sample CloudFormation template, which launches a fully operational stack with SeeDub pushing metrics for an autoscaling group of two t1.micros. Feed it your KeyName as a param, and with one button you'll have actionable metrics with pretty graphs like the above within fifteen minutes. Via the command-line:

cfn-create-stack SeeDubSample2 -f seedub.iam.json --capabilities CAPABILITY_IAM --parameters "KeyName=bingo"

NOTE: If you launch this CloudFormation stack via the AWS Management Console, you must check the little box that says it's ok to create an IAM user, which is dangerous even if we claim to follow the principle of least privilege, like so:

NOTE: it also uses EeSeeToo, another of our packages which so far acts solely to provide an instance with the name of its autoscaling group (ASG).

Chef and CloudFormation

netbas — Sun, 15 Jan 2012 20:08:13 +0000

The ephemeral nature that comes along with cloudy virtual machines means that we need to be able to go down and come up at any time without flinching. What good is hardware scriptability if the app itself doesn’t lend itself to automated management? For years, we’ve been using custom sets of homebrew scripts, checked-in alongside both the app code and any virtual hardware scripting. It was cumbersome, but it worked beautifully. Chef promised to make the cumbersome elegant. Did it?

Yes, it did. The learning curve was steep at first. It seemed to take a lot of knowledge just to get set up. We started with CloudFormation (CFN), for which AWS provides sample templates of both a Chef server and a sample app as a Chef client.

After getting them up and running, we replaced the Chef server stack with Hosted Chef. For PCI compliance, we later built a private chef server. The results were spectacular. One button launches an AWS stack in any region, as before, but now the only thing that CFN has its machines do is bootstrap chef. Have the standard AMIs include it already, and all that has to happen is a passing of Chef Server location and initial creds. Chef handles the rest! What’s more, you can configure chef-client to register a nodename corresponding to IP address, stack name, region, whatever you like (see below).

In case it helps, it took about two full-time weeks to go from homebrew stack to chef stack, with no previous knowledge of Chef. This makes the dev/ops approach much easier to sell to all concerned parties. For example, the notorious case of application configurations have a new potential place. One client wisely wrote an app which first looked to a sharding table to figure out where its DB was. The problem: the sharding table was in a DB, and self-referencing. Moving such a typically static, simple thing to Chef attributes makes perfect sense, as it does for a whole range of app configurations as well as ops configurations.

Note: If you ever wonder where ohai’s ec2 attributes are, you may have come across a known ohai bug within the VPC. The solution is simply “With Ohai 6.4.0, create /etc/chef/ohai/hints/ec2.json to enable EC2 attribute collection,” as done below using cfn-init.

Note: These examples use CloudFormation helper scripts, but there is nothing you can’t do here with simple scripting. Below lies a sample LaunchConfig. All that remains is to put the node config into one var.

Update: We recently had to retrofit my old chef stacks to work in particular subnets within a VPC and AWS’s Chef Server template, which works right out of the box. But now, when we need to tweak stack json, we modify the LaunchConfig, cfn-update-stack, and as-terminate-instance-in-auto-scaling-group, quickly iterating and saving oodles of time. Once the chef handoff is made, we can get to recipe-writing and role-assigning.

        "FrontEndLC" : {
            "Type" : "AWS::AutoScaling::LaunchConfiguration",
            "Metadata" : {
                "AWS::CloudFormation::Init" : {
                    "config" : {
                        "packages" : {
                            "rubygems" : {
                                "chef" : [],
                                "ruby-shadow" : [],
                                "ohai" : [],
                                "json" : []
                            },
                            "yum" : {
                                "ruby19"            : [],
                                "ruby19-devel"        : [],
                                "ruby19-irb" : [],
                                "ruby19-libs"            : [],
                                "rubygem19-io-console"              : [],
                                "rubygem19-json"             : [],
                                "rubygem19-rake" : [],
                                "wget"            : [],
                                "rubygem19-rdoc"        : [],
                                "rubygems19"        : [],
                                "rubygems19-devel"        : [],
                                "gcc"        : [],
                                "gcc-c++"        : [],
                                "automake"        : [],
                                "autoconf"        : [],
                                "make"        : [],
                                "curl"        : [],
                                "dmidecode"        : []
                            }
                        },
                        "files" : {
                            "/etc/chef/client.rb" : {
                                "content" : { "Fn::Join" : ["", [
                                    "log_level        :info\n",
                                    "log_location     STDOUT\n",
                                    "ssl_verify_mode  :verify_none\n",
                                    "chef_server_url  '", { "Ref" : "ChefServerURL" }, "'\n",
                                    "environment      '", { "Ref" : "Environment" }, "'\n",
                                    "validation_client_name 'chef-validator'\n"
                                ]]},
                                "mode"  : "000644",
                                "owner" : "root",
                                "group" : "root"
                            },
                            "/etc/chef/roles.json" : {
                                "content" : {
                                    "run_list": [ "role[frontend]" ],
                                    "chef_role": "frontend",
                                    "stack_name": { "Ref" : "AWS::StackName" },
                                    "aws_region": { "Ref" : "AWS::Region" },
                                    "deploy_user": { "Ref" : "DeployUser" },
                                    "deploy_pass": { "Ref" : "DeployPass" },
                                    "deploy_bucket": { "Ref" : "DeployBucket" },
                                    "warning_sns_arn": { "Ref" : "WarningTopic" },
                                    "critical_sns_arn": { "Ref" : "CriticalTopic" },
                                    "iam_access_key": { "Ref" : "IAMAccessKey" },
                                    "iam_secret_key": { "Fn::GetAtt" : ["IAMAccessKey", "SecretAccessKey"] },
                                    "frontend_endpoint": { "Fn::GetAtt" : [ "FrontEndELB", "DNSName" ] },
                                    "s3_bucket": { "Ref" : "S3Bucket" }
                                },
                                "mode"  : "000644",
                                "owner" : "root",
                                "group" : "root"
                            },
                            "/etc/chef/ohai/hints/ec2.json" : {
                                "content" : "{}",
                                "mode"   : "000644",
                                "owner"  : "root",
                                "group"  : "root"
                            }
            "Properties" : {
                "KeyName" : { "Ref" : "KeyName" },
                "SecurityGroups" : [ { "Ref" : "FrontEndSG" } ],
                "InstanceType" : { "Ref" : "FrontEndInstanceType" },
                "ImageId": { "Fn::FindInMap": [ "AWSRegionArch2AMIEBS", { "Ref": "AWS::Region" }, { "Fn::FindInMap": [ "AWSInstanceType2Arch", { "Ref": "FrontEndInstanceType" }, "Arch" ] } ] },
                "UserData" : { "Fn::Base64" :
                               { "Fn::Join" : [ "", [
                                   "#!/bin/bash\n\n",

                                   "/opt/aws/bin/cfn-init -v --region ", { "Ref" : "AWS::Region" },
                                   " -s ", { "Ref" : "AWS::StackName" }, " -r FrontEndLC ",
                                   " --access-key ", { "Ref" : "DeployUser" },
                                   " --secret-key ", { "Ref" : "DeployPass" }, "\n",

                                   "LOCAL_IP=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4`\n",
                                   "IID=`curl -s http://169.254.169.254/latest/meta-data/instance-id`\n",
                                   "echo \"node_name        \\\"", { "Ref" : "AWS::StackName" }, "-frontend-$LOCAL_IP-$IID\\\"\" >> /etc/chef/client.rb\n",
                                   "/usr/local/bin/chef-client -N ", { "Ref" : "AWS::StackName" }, "-frontend-$LOCAL_IP-$IID -j /etc/chef/roles.json", "\n"
                               ]]}
                             }
            }
        },

AWS CloudFormation Case Study

netbas — Mon, 25 Jul 2011 18:28:17 +0000

I was asked to write up how I implemented CloudFormation as it began to roll out. It helped me replace RightScale wholesale, in as flexible a manner as I cared to code in json. Below is a draft.

—-

Our task was to roll out a set of ad products based around influence as a metric using AWS for things like analytics, smart display ads, contextual and behavioral targeting to name a few. Starting fresh and fast with no physical infrastructure and oodles of new data, we had to remain nimble and scale quickly. Building from a team of one to twenty in the span of months, however, it quickly became necessary to automate and organize not only machines but process. Each product required QA/LT, staging and production environments. DR, HA and scaling requirements demanded a plan. Enter CloudFormation.

Each product has its own set of CloudFormation stacks, configured to procure the necessary AWS services: ELBs, autoscaling groups, queues, security groups, buckets, etc. Great; then what? The machines need to talk to each other. With EC2′s UserData field, machines can be passed values for any of the resources brought up in the stack, e.g. RDS endpoints/creds, private IPs, queue ARNs, etc. Use an AMI which can execute a command issued by UserData and you’re done. Ubuntu and Amazon Linux do this off-the-shelf; simply start the UserData with a shebang. Stack machines can now come up with all the stack data and configure themselves for service via the deployment method of choice. Furthermore, since they can update themselves, there’s no need for reburning AMIs every time configuration changes or updates are needed. Our machines even set the prompt to their role and stack name so we don’t get lost in a sea of terminals. They can even configure their own DNS.

Once the stack for QA is written, the QA lead can launch it with one button. Change autoscaling groups of one to one hundred and use this copy of the template to launch staging for LT. With stack mappings, you can use the same template for both. Add some alarms, change thresholds and launch again for production. Next round? Bring up a second production stack next to the old one and cut over. Need to revert? Leave the old one up and switch back. Virginia slides into the ocean? Launch the exact same stack in Singapore. Shut down the staging and old production stacks when you’re done to save money. We call our release method Deployment by Death because all we do is update the release code and kill the boxes.

Here’s how we bring up three Targeting stacks across the world in one quick line:

> for i in us-east us-west eu-west; do cfn-create-stack Targeting-QA1 -f targeting.qa1.json --region $i-1; done arn:aws:cloudformation:us-east-1:281541528619:stack/Targeting-QA1/28af2dsa-b4a7-110e-a938-6861c490a786 arn:aws:cloudformation:us-west-1:281541528619:stack/Targeting-QA1/234e5cd0-b4a7-110e-c8ac-2727c0db5486 arn:aws:cloudformation:eu-west-1:281541528619:stack/Targeting-QA1/154a5d00-b4a7-110e-a26e-275921498aea

They all come up configured and serving, typically within minutes.