DevOps Muscle for the Crossfit Games

 

As we approach the Crossfit games once again, I thought I’d share some detail on how Foghorn helped Crossfit update their DevOps processes for the 2016 games.  Crossfit has been growing rapidly, and had challenges in 2015 scaling their leaderboards.  We worked together to prep for the 2016 games, and had some great success.  As usual, we leveraged HashiCorp tools to get the job done.. If you are interested reading more, we’ve dropped a case study on our main site.

Posted in General

Business Risk? Or Assurance

risk_and_rewardWith the announcement that Snapchat has gone ‘all in’ on Google Cloud Platform, Snap has incorporated this plan into their financial filings as an additional ‘business risk’.

“Any disruption of or interference with our use of the Google Cloud operation would negatively affect our operations and seriously harm our business…”

My immediate reaction was really to question whether this is a business risk, or a business assurance? Certainly Snap is now dependent on Google’s ability to scale and manage a massive infrastructure, and so the disclosure is appropriate. But as a prospective investor, I’d feel that a great potential risk to the business, loss of availability of the Snapchat service, has been greatly reduced.

If there was a book maker taking odds on the likelihood of various companies making technical and/or operational missteps that cause an outage, Google would not be the company I’d bet on.  Quite the opposite, I think they’ve proven over the last 15 years or so that they’re pretty good at running large infrastructure.

As this reality begins to sink in with investors, partners, customers, and business leaders, cloud adoption will accelerate beyond current predictions.

 

Posted in GCP, General, Public Cloud

Terraform beats CloudFormation to the Punch with Inspector Support

block_and_punch

Cloud Neutral DevOps

HashiCorp makes some of our favorite DevOps tools.  Along with being feature rich, stable, and well designed, they are cloud neutral. This allows DevOps teams to become experts with a single tool without having to get locked in to a single cloud vendor.  Some cloud neutral tools try to completely abstract the cloud provider and the services available.  This forces the user to only use the ‘lowest common denominator’ of services available from all supported providers.  With Terraform, HashiCorp has not fallen into this trap.  They embrace the rich set of services available from each provider, with different services supported for different clouds. This allows us to put the right workload in the right cloud, without the need to leverage multiple tools, or build multiple deployment pipelines.

Terraform Supports AWS Inspector

It would be expected, however, that Terraform would trail the cloud providers’ proprietary tools supporting new cloud products and features.  But HashiCorp is amazingly quick to support features.  A great example is v 0.8.5, which now supports AWS’ Inspector service.  As of the publishing of this post, AWS’ own CloudFormation tool still does not have support for Inspector.  Pretty amazing for a small company offering an open source product!

Tagged with:
Posted in Amazon Web Services, AWS, AWS, Cloud, Public Cloud

Who’s Managing your Cloud?

managementAfter designing, building and managing hundreds of environments, sometimes we get a little too deep in the weeds with our blogs. So I thought I’d share this article, which gives a high level perspective on how companies benefit from working with a cloud managed services provider.

The article covers performance, scalability, security and compliance.  I’d add some additional benefits, like:

Cost Optimization:  Your provider knows where additional cloud spend will help, vs flushing money for little benefit.

Agility: Sure, the cloud enables agility, but it doesn’t guarantee it.  Your provider should be full of DevOps ninjas, who can put in place the pieces that don’t come ‘out of the box’ with IaaS.

Manageability: It’s so easy to string together IaaS components that give you the functionality you need. It’s also easy to do so in a manner which creates a management nightmare.  Especially if you’ve never done it before.  Your provider should lead you down the path to an infrastructure that can scale easily without additional management overhead.

And the winner is.. 

There are lots of great choices out there, although I’m pretty biased for FogOps, where our motto is “Live by Code”.  The meaning?  Everything we do to manage your site is done with code, leaving you with a self healing, auto-scaling environment that leverages continuous deployment to make your life easy.

Oh yeah, and it works on AWS, Azure, and Google Cloud.

Tagged with: , ,
Posted in AWS, AWS, Azure, Cloud, GCP, Public Cloud

Disney goes Hybrid; Shares Challenges

disney

Ian Murphy recently wrote a great article on Disney’s journey to the Hybrid Cloud. The lightning talk, given by Blake White,  highlighted the issues that many enterprise companies face when adopting some of the latest technologies, like Kubernetes and AWS, and integrating them with their existing on-prem infrastructure.  Although these technologies are well suited for integration, often the heavy lifting has to happen by the enterprise.  Many open source projects are very robust, but their focus is not on enabling integration with existing infrastructure.

The perfect example given in the talk can be found when Blake explains that in order to get the integration that Disney required, they had to build their own bespoke Kubernetes cluster provisioning tool.

Despite these challenges, Disney is forging ahead – a good sign that the value they are receiving makes overcoming the challenges a worthy endeavor. Lesson to learn?  Things worth doing are hard. Don’t let that stop you!

 

Posted in General

Crunching HIPAA data just got cheaper

With the recent AWS announcement, AWS customers can now leverage spot instances to crunch their HIPAA big data workloads.   This can help decrease the compute costs of these jobs by up to 90%, making EC2 a cost effective option for crunching large amounts of data that include Protected Health Information.

Amazon’s BAA with its HIPAA compliant customers requires that all EC2 instances that process PHI must run in dedicated tenancy mode.  Until now, spot instances were not available in dedicated tenancy mode, leaving this cost effective option unavailable for processing PHI.

Spot instance pricing is Amazon’s method of selling excess capacity that can be pre-empted if needed.  Spot pricing is market based, and often falls well below even the steepest discounts afforded with long term commitments.  Since the nodes can be pre-empted, spot instances are not suitable for many types of workloads, but most cluster compute technology is designed to tolerate node losses, making spot instances a great way to save money on short lived tasks that require high compute power.

I took a quick peek in the AWS interface, and didn’t see any option to leverage dedicated spot instances in AWS’ managed Hadoop framework, EMR.  Hopefully we will see that soon!

Tagged with:
Posted in Amazon Web Services, AWS, AWS, Cloud, Public Cloud

Pay AWS Less for your Dev and Test Workloads

24×7 environments are handy, but are they required for Dev and Test?

I’m going to assume your development team is not leveraging development environments 24 hours a day 7 days a week.  That is to say, I’m assuming you don’t have 3+ teams on shift throughout the world.  I’m also going to assume you aren’t building and destroying development environments as part of your continuous deployment pipeline (more on that later).  Lastly, I’m going to assume that development does not need to mirror production (that’s what Test is for).  First off, AWS provides numerous means by which you can tune your setup with cost savings in mind.  These are focused around some dev and test behavior that may have gone overlooked.  So with all that in mind, let’s cut costs.

Scheduled development servers

First off, let’s simply make our development server layer mirror our actual development schedule(s).  I am talking about the stateless tier, not the database.  Let’s create time-based scaling triggers (natively supported in Auto Scaling Groups, Elastic Beanstalk, OpsWorks, or just use Lambda!).  Scale-In to 0 instances in service 1 hour after development teams stop working.  Scale-Out to 1 instance in service 1 hour before development teams start working.  Even a hardcore development team working from 6am to midnight six days a week still leaves money on the table.  Let’s see how much exactly, assuming 5 development teams, each using a single c4.large development environment:

Scheduled Dev Environment Cost

Parking development servers

Let’s say we have already implemented scheduled development servers as outlined above, or maybe we haven’t yet but we are using Auto Scaling Groups.  In either case, we can simply park an unused development environment if we know no one is actively contributing code changes.  Let’s take the following scenario.  We have a major push happening on 2 of our products.  Development on the other 3 has been halted so resources can be borrowed.  We will be suspending any work for two consecutive sprints (let’s say we use two week sprints).  That’s 3 development environments with no changes for a month.  We can simply set our desired capacity in the auto scaling group to 0, or if we setup time-based scheduling, simply remove the Scale-Out rule.

parked-dev
Resource utilization, or lack thereof

It’s easy to get complacent with an instance type.  The development team has grown to love the fast provisioning and rock solid compute & network reliability of the c4.  Furthermore, since you use c4.2xlarge instances in production it’s a logical development downgrade using the c4.large.  But what if your development environment didn’t really require the processor performance and network stability? What if instead you simply needed any 2 core ~ 4GB memory server?  A t2.medium might well do the job..  It lacks the sustained compute capability, but that is rarely needed in dev.  Assuming 5 development environments, you can save even more:

t2-costs

Resource configuration, being thoughtful with your decision

Let’s say your production database has strict IOPS requirements.  This database was created before AWS released General Purpose SSD (GP2) EBS storage.  As a result, you configured Provisioned IOPS to meet  your requirements.  Since test needed to match production, the same Provisioned IOPS were brought over to the database.  Furthermore, in an effort to mirror test with production, they also made test Multi-AZ RDS.  These decisions were innocent at the time, but have a definite impact on hourly costs.  Since our databases are running 24×7, there is an immediate opportunity to provide the same performance level at a reduced availability and cost.  Here is how that would play out, assuming we reduced our test DB to a single availability zone with gp2 based storage:

rds-costs

 

It is critical to test against a duplicate of production before deploying to production, but let’s first assume you aren’t doing blue/green deployments (more on that later).  Test only needs to mirror production during testing.  And just like scheduled development servers, test environments should be running during automated and QA test schedules.  Going even one step further, the mirroring of production in test need only occur during performance tests (or similar tests where resources impact meaningful results).  Non performance related testing can occur on a production-like environment, similar in layers but resourced like development and run on a schedule matching QA and automation testing.  Then when true performance testing occurs, the test environment can be modified to production-like resources.

Test should mirror production, sometimes

You have a test environment in one of two states. State 1 is the QA and Automation validation and regression testing setup. State 2 is the production mirror performance testing setup. Let’s compare 24×7 vs Scheduled and Non Performance Testing vs Performance Testing. Production uses Elastic Load Balancing, 15 c4.2xlarge application servers, 3 node m4.xlarge Redis cluster and an r3.xlarge MySQL database. For ease of scheduled states, the database will be production specification running 24×7, but we will not be using the AWS RDS Multi-AZ feature like we do in production. This environment is about $6000 / month for a truly cloned production setup. But we don’t need that level of scale unless we are performance testing.  By scaling in and applying our 8am-6pm scheduling, we can reduce the costs dramatically.  When it’s time to do performance testing, we scale up only for the tests, lets say twice a week for 4 hours. Sparing you the math, We end up at around $1,040 / month.  As you can see, combining all of these techniques can save a fortune.

Do I even need a dedicated test environment?

Back to blue/green deployments.  This is not a blog post about the what or why or how of blue/green deployments.  But let’s say your environment supports that kind of deployment.  Why run a test environment that is a mirror of production 24 hours a day (or even on a schedule) when you can simply build the environment, run through the testing, perform the auto scaling group swap (for example), wait a reasonable amount of time to support rollback, and finally terminate the previous production environment.  In this case, a test environment that was running 24×7, or even 96 hours a week can be reduced to the time it takes to build, test and support rollback.  If this is automated (more on that later), your scheduled test environment running 60 hours a week could potentially be reduced to a production clone running for more like 8 hours a week.  In this world, let’s assume our workflow is very simple, measured in days.  Day 1, we build and test.  Day 2 we cutover blue/green and leave both running.  Day 3, we leave both running for one more day of rollback.  Our 60 hour scheduled test environment is now 24 hours per week with the added benefit that we are testing against production specifications while at the same time saving money.

Why automate the build as part of testing?

Whether you are doing blue/green deployments or not, there is a justification in building the test environment from nothing each time you do it.  While the only cost savings measure in doing so is reduced run time (you build, test, and destroy, or build, test, and cutover), the benefits go way beyond that.  This workflow validates far more than an updated application deployment.  You are also testing your configuration management, your infrastructure code and potentially the same workflow you would use for disaster recovery.  Building a server using configuration management to create a test environment and never doing so again is stopping short of the power of configuration management.  This “permanent” environment you created introduces false assurance that you can recreate this setup any time.  Since application code is being pushed to servers with application code already on them, unknown dependencies get introduced into the application.  That dependency structure doesn’t stop at the application either. By not building from your servers from nothing, even the configuration management code may be presuming a given state.  Automating test environments from scratch leads to automating build/deploy, which leads to well exercised infrastructure code.

Tagged with: , ,
Posted in Amazon Web Services, AWS, AWS, Cloud, General, Public Cloud

VNet-in-a-box. Get your Azure workloads moving!

azureAzure for the masses

Microsoft has come a long way in the past two years with their cloud offerings, and Azure is now a legit IaaS option that many of our customers are interested in. But the same blockers to other cloud providers still apply.

vnetStuck in Neutral?

Many of our customers have come to us with a problem of inertia.  Everyone in the company wants to get to the cloud, but the hurdles to get that first workload up and running are just too big. Easier to just keep adding to the stuff in the datacenter.  Each of these decisions makes sense on their own, but imagine if you would have taken the plunge a year ago?  You wouldn’t be dealing with the procurement nightmare right before the holidays to get that new ‘urgent’ project the resources they need.  The time is never right, you have to jump in at some point.

Are these your blockers?

Foghorn has helped lots of companies get past these hurdles.  The biggest ones we see are security and network integration. Companies unfamiliar with the Azure VNet features feel they need to make sure the VNet is configured to both protect their cloud workloads from the internet as well as protecting their coporate networks from cloud workloads.  At the same time, they need to understand how to best integrate an Azure VNet with their corporate network.

You are closer than you think!

finishFoghorn has developed a process, a set of best practices and a set of codified templates that allow us to help companies get over these hurdles in days instead of weeks or months.  We deliver it in a handy piece of code, and enable companies to extend their private network into a secure cloud environment and instantly benefit with new infrastructure available on demand. We call the offering VNet-in-a-box.  In a few days you can be spinning up servers, connecting to them from your corporate network, and configuring them for that urgent business need.  We have an easy to swallow fixed price for the entire engagement, and if you qualify, Microsoft might even pick up some.. or all.. of the tab.  Step 1?  Call Foghorn, or check out a few more details here.

Posted in General

TAC v TAM

We get questions regularly about the difference between the industry standard Infrastructure as a Service (IaaS) provider Technical Account Manager (TAM) and
the Foghorn specific
FogOps Technical Account Consultant (TAC).  The truth is that even though these acronyms are only one letter off, they couldn’t be farther apart in terms of what they deliver and the benefits that are realized.  In fact, most organizations that are leveraging an application stack in the public cloud could likely benefit from both TAM and TAC services.  This article explains the differences between the two offerings and how to ensure that you have a support
and engineering model that ensures success for your application’s full stack in the cloud.

 

TAM Backstory 

The traditional TAM offering was born from the need for additional, white-glove, manufacturer/provider support  that allowed enterprise customers to get an increased level of assistance for the products and services they used in their information technology environment.  Familiar examples include operating systems providers like Microsoft and RedHat, hardware companies like NetApp and cloud service providers like Amazon Web Services (AWS) and Google.  In the world of cloud, escalating vendor support tickets, providing additional non-public insight into bug fixes, escalating root cause analysis (RCA) and helping to provide product and service roadmap insights are common industry cloud IaaS provider TAM services.

Since the term TAM has been adopted by lots of technology companies for many different varying services, and since Foghorn is a cloud company, I’ll focus my comparison around IaaS provider TAM services.

The Challenge

warningLike the support services they oversee, the cloud IaaS provider TAM is able to provide guidance, advice and information.  But when it comes to leveraging those things to provide hands-on engineering, configuration, upgrades, etc., the TAM is usually not contractually able to assist.

Cloud vendor provided professional services are usually available to provide hands-on expertise to attempt to pick up where TAM’s leave off.  However, the many layers (and accompanying vendors) in most application stacks along with the desire for more enterprises to leverage multiple cloud providers results in the need to leverage multiple TAMs and multiple vendor-based professional services groups.  This is far from ideal given the expense and common finger pointing that occurs among providers.  Best case, your project’s velocity suffers while the various groups figure out how to work together towards a common objective.  Worst case, your site’s environment suffers from availability, security, and/or performance issues caused by gaps among vendors.

Enter FogOps TAC

image

A FogOps TAC picks up where the cloud TAM leaves off, providing a named Consultant who bridges the gap between advice and execution in a multi-cloud, full stack environment.

Similar to a TAM, but with multi-cloud capabilities, a TAC can join customer work sessions, presentations and meetings and ensure that financial commitments with pay-as-you-go services are continually optimized.   Additionally, the TAC can actually implement best practice advise and solve engineering challenges.

Best of all, there is no finger pointing, or lost time due to gaps among providers since there is one named resource looking out for issues site wide and stack deep.

The most visible and immediate benefits are increased velocity and inter-operability cloud wide and stack deep.  But easily taken for granted is the value realized from the TAC’s leadership and technical project management capabilities–none of these projects matter if business value is not realized and the TAC ensures that happens.

A few brief examples to help illustrate:

  • Resource Emergency! – Let’s consider the example of a cloud provider who has some sort of cloud resource limit that cannot be exceeded by their customer.  Theoretically, a cloud provider TAM may work to escalate an exception to this limit, but in some cases an exception may not be possible.  The TAM will explain the situation and possibly advise the customer on a strategy to circumvent the limit through best-practice, cloud-provider recommended engineering and architecture.  The TAM’s role in this situation likely ends here and that’s where the FogOps TAC continues.  After confirming the strategy won’t have adverse impact, the FogOps TAC executes the required engineering, likely involving changes to automation code.   No finger pointing, no delays, with the breadth and depth of the site taken into account.
  • Scaling is Broken! – Now consider an auto-scaling environment built according to IaaS provider best practice.  All works well except for the larger than anticipated IaaS usage charges.  Streamlining the entire system from web server configuration, to auto scaling rules, to server bootstrap process would be a typical FogOps TAC activity.  The result?  Seamless scaling and more connections per server, increased site performance and reliability while simultaneously reducing IaaS cost and usage.

Engineering and Support Models

A common and successful enterprise cloud support model includes IaaS provider enterprise support along with a FogOps TAC for hands-on, full stack architecture, engineering and execution.

But what happens when one named resource is not enough and your desired velocity exceeds your ability to execute?

That’ll be the topic of a future  blog post.  For now, I’m off to enjoy the next episode of Westworld!

Tagged with: , ,
Posted in Cloud, General, Public Cloud

Infrastructure as Code in Google Cloud

Terraform PlanWhy Foghorn Codes Infrastructure

At Foghorn, we manage lots of customer infrastructure via our FogOps offerings.  Code is a great way to help make sure that we deliver consistent infrastructure that is repeatable and reusable.  We can version our infrastructure. We can spin up new environments (staging, qa, dev) in minutes with the confidence that they are exact duplicates.  We can even ‘roll back’ to a degree, or enable blue/green deploys.

Our Favorite Tool

Each cloud provider has their own tool(s) for defining, provisioning, and managing infrastructure as code.  For us, since we work in so many different environments, we’ve chosen to standardize on Terraform by HashiCorp.  Terraform has great provisioners for all of the major cloud providers, and has some really cool features that you won’t see across the board from the cloud specific options.   Although we are competent in all of the tools, Terraform gives us the unique opportunity to build multi-cloud infrastructure with a single deployment.

Example – Hello Google

Google Cloud Platform has come a long way in the last couple of years. My colleague Ryan Fackett recently put together a modified “hello world” example for Google Cloud, so I’ll use that to get us from the 10,000 foot view straight down to taking a peek at some code.  In order to spin up a workload in Google Cloud, we first need a network and a few other infrastructure dependencies. These include:

  • Network
  • Subnetwork
  • Firewall Ruleset

With these basics in place, we can spin up a server, configure it as a web server, and write our “hello Google” app.  To get traffic to it, we’ll write an IP forwarding rule, and note the IP address.  If all goes well, the code we write will create the resources, configure the server, and we’ll be able to hit the IP address with a web browser and see our site.

A Look at the Code

Let’s take a look at the code that creates the network.  We need a network and at least one subnet.  Since a subnet lives in a single region, we’ll create a few subnets in different regions to allow us to spin up a server in various parts of the world.  The code has been snipped for brevity, but it should give you a good idea of the code you may write to form your infrastructure.

The Network

resource "google_compute_network" "vpmc" {
name                    = "vpmc"
description             = "VPMC Google Cloud"
auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "vpmc-us-east1" {
name          = "vpmc-us-east1"
ip_cidr_range = "${var.cidr_block["net1"]}"
network       = "${google_compute_network.vpmc.self_link}"
region        = "${var.subnetworks["net1"]}"
}

resource "google_compute_subnetwork" "vpmc-us-central1" {
name          = "vpmc-us-central1"
ip_cidr_range = "${var.cidr_block["net2"]}"
network       = "${google_compute_network.vpmc.self_link}"
region        = "${var.subnetworks["net2"]}"
}

You might notice some variables in here instead of hard coded values ${var.foo[“bar”]}. We are using variables for the subnet CIDR blocks as well as the regions. This allows us to leverage the same code across multiple workloads, and set the variables accordingly.

You will also notice we use a reference to associate the subnets with the network. This is a requirement because as of the time we write the code, the network does not exist yet. ${foo.bar}.

Firewall Access

Next we will need a firewall policy to grant incoming connections to a server:

resource "google_compute_firewall" "http" {
name = "vpmc-http"
network = "${google_compute_network.vpmc.name}"

allow {
protocol = "tcp"
ports = ["80"]
}

source_ranges = ["0.0.0.0/0"]
target_tags = ["demo"]
}

By setting the target_tags, any instance with that tag will inherit the firewall policy.

Forwarding and Load Balancing

Next we need a front door with a public IP so we can hit our web site. Most web sites will be load balanced, so this code puts us in a position to run HA by spinning up multiple servers in multiple regions. I won’t go through it in detail, but it’s here for your reference:


resource "google_compute_http_health_check" "vpmc-healthcheck" {
name = "vpmc-healthcheck"
request_path = "/"
check_interval_sec = 30
healthy_threshold = 2
unhealthy_threshold = 6
timeout_sec = 10
}

resource "google_compute_target_pool" "vpmc-pool-demo" {
name = "vpmc-pool-demo"
health_checks = ["${google_compute_http_health_check.vpmc-healthcheck.name}"]
}

resource "google_compute_forwarding_rule" "vpmc-http-lb" {
name = "vpmc-http-lb"
target = "${google_compute_target_pool.vpmc-pool-demo.self_link}"
port_range = "80"
}

resource "google_compute_instance_group_manager" "vpmc-instance-manager-demo" {
name = "vpmc-instance-manager-demo"
description = "Hello Google Group"
base_instance_name = "vpmc-demo-instance"
instance_template = "${google_compute_instance_template.vpmc-template-demo.self_link}"
base_instance_name = "vpmc-instance-manager-demo"
zone = "${var.subnetworks["net1"]}-d"
target_pools = ["${google_compute_target_pool.vpmc-pool-demo.self_link}"]
target_size = 1

named_port {
name = "http"
port = 80
}
}

You’ll notice there is no IP address in here. That’s because google hasn’t assigned it yet. We’ll need to query it later to know where our site lives.

Our Web Server

Finally, we spin up a server and some associated resources. You’ll see that we boot from a default ubuntu template and configure the instance with a bootstrap script:

resource "google_compute_instance_template" "vpmc-template-demo" {
name_prefix = "vpmc-template-demo-"
description = "hello google template"
instance_description = "hello google"
machine_type = "n1-standard-1"
can_ip_forward = false
tags = ["demo"]
disk { source_image = "ubuntu-1404-trusty-v20160406" auto_delete = true boot = true } network_interface { subnetwork = "${google_compute_subnetwork.vpmc-us-east1.name}" access_config { // Ephemeral IP } } metadata { name = "demo" startup-script = <<SCRIPT #! /bin/bash sudo apt-get update sudo apt-get install -y apache2 echo '<!doctype html>
<h1>Hello Google!</h1>
' | sudo tee /var/www/html/index.html SCRIPT }   scheduling { automatic_restart = true on_host_maintenance = "MIGRATE" preemptible = false } service_account { scopes = ["userinfo-email", "compute-ro", "storage-ro"] } lifecycle { create_before_destroy = true } }

By adding the “demo” tag to the instance, we automatically associate it with the firewall rule we created earlier.

Google Authentication

In order to actually spin up an environment, we’ll need a google account, and we’ll need to give Terraform access to credentials. A simple test can be done with this code:


provider "google" {
credentials = "${file("/path/to/credentials.json")}"
project = "test-project-1303"
region = "us-east1"
}

Terraform Plan

Running terraform plan will prep us for running an apply. Terraform will look at our code, and compare the requested resources to the existing state file. New resources will be created. Absent resources will be destroyed. Plan tells us which changes will be made when the next apply is executed. It is especially for this reason that plan is so valuable. Let’s take a look at a portion of the response:

+ google_compute_subnetwork.vpmc-us-east1
    gateway_address: ""
    ip_cidr_range:   "10.1.0.0/18"
    name:            "vpmc-us-east1"
    network:         "${google_compute_network.vpmc.self_link}"
    region:          "us-east1"
    self_link:       ""
 
+ google_compute_target_pool.vpmc-pool-demo
    health_checks.#: "1"
    health_checks.0: "vpmc-healthcheck"
    instances.#:     ""
    name:            "vpmc-pool-demo"
    project:         ""
    region:          ""
    self_link:       ""
 
Plan: 16 to add, 0 to change, 0 to destroy.

 

Terraform Apply

Ok, plan told us all the changes Terraform will make when we run an apply. We approve of the list, so we run apply to make the changes. The response:

google_compute_address.vpmc-ip-demo: Creating...
  address:   "" => ""
  name:      "" => "vpmc-ip-demo"
  self_link: "" => ""
google_compute_network.vpmc: Creating...
 
...
 
google_compute_instance_group_manager.vpmc-instance-manager-demo: Still creating... (10s elapsed)
google_compute_instance_group_manager.vpmc-instance-manager-demo: Creation complete
 
Apply complete! Resources: 16 added, 0 changed, 0 destroyed.
 
The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.
 
State path: terraform.tfstate

We can now go to our Google Console, find the IP address for the forwarding rule that we created, and hit it in a web browser:

Hello Google

Automatic CMDB

ITIL processes recommend tracking all of our IT configuration items in a Configuration Management Database. The usual method to ensure this happens is to put in place a change control process. All changes go through the change control process, which includes updating the CMDB. As time goes by, human error tends to create drift between what is in the CMDB and what actually exists. This can cause difficulties in troubleshooting, and can create time-bombs like hardware that falls out of support, or production changes that have not been replicated to the DR environment.

Consider the .tfstate file that Terraform creates/updates after running an apply. It includes all of the details of the infrastructure “as built” by Terraform. This is, in effect, your CMDB. If Terraform is the only deployment tool used (this can be enforced with cloud API permissions), the accuracy of your CMDB is effectively 100%. Add this to the fact that this code can, sometimes in minutes, spin up from scratch a complete DR site (less your stateful data), and you can see the benefits. You also protect against needing ‘tribal knowledge’ to maintain your site.

In a Nutshell

The whole reason we didn’t treat infrastructure as code for many years is that we simply couldn’t. Cloud infrastructure API’s have completely abstracted the operator from the hardware. Although this creates constraints for the operator, it also creates opportunities. Infrastructure as Code is one of the major potential benefits of Cloud Infrastructure. If you aren’t doing it, you should be.

Foghorn Consulting offers FogOps, an alternative to managed services for cloud infrastructure. We build and manage our customer environments strictly with code, and our customers see the benefits in the form of lower ongoing management costs, higher availability, and more confidence in the infrastructure that powers their mission critical workloads.

Next Up – Pipelining your Infrastructure

There is a ton of additional benefits to treating infrastructure as code. In addition to having self documented, versioned, and re-useable infrastructure modules, we can extend our toolset to use CI/CD tools to build a full infrastructure test and deployment pipeline.  I’ll give an example in a future post.  Stay tuned!

Posted in Cloud, General, Public Cloud
Follow Foghorn