Updates to VPC-in-a-Box

In case you aren’t familiar with the offering, VPC-in-a-Box is Foghorn’s best practice VPC design for Amazon Web Services; customized for client workloads, and delivered as re-usable, versionable code.  We’ve been delivering and iterating this service since 2014, and have improved the offering over the years.  We first added reverse proxy with a slick autoscaling squid proxy option.  Later we added S3 endpoints, cross region VPN connectivity, and NAT Gateways to replace our custom HA NAT server configuration.  For the first several years the offering was available exclusively via Cloudformation templates.

This year we’ve extended the offering to Terraform, and included VPC-in-a-box code as a free module for FogOps customers.  In addition, we’ve added tons of configurability around supernet and subnet sizing, all available via parameters. No custom coding needed, which means that clients who elect to standardize on our VPC module can rely on Foghorn to maintain the code, and benefit from future enhancements to the module.

The goal of this offering is to eliminate the ongoing management of one more piece of infrastructure that is not a business differentiator for our clients.  In order to make sure the offering lives up to this goal, Foghorn is continually tweaking the configuration, ensuring two main things:

  1. The design represents best practice for the current features available at Amazon Web Services.  As new features are introduced, we constantly evaluate the features, determine whether we should change our recommendations on best practice, and modify if necessary.  Since our modules are versioned, customers can upgrade at their leisure.
  2. The code is compliant with the most recent version and features of the Infrastructure as Code tool of choice.  As CloudFormation features are released, we update our code. Likewise with Terraform, we ensure that our modules are fully tested with every new release.

My favorite new feature is Terraform Workspaces.  Stay tuned for a post in the near future, where I’ll walk through how we are using workspaces to help DevOps and SRE teams to leverage a single code base to manage multiple environments, ensuring that staging looks like production, and DR looks like both of them!

Ryan’s favorite feature is the ability to simply set the count of NAT gateways desired.  If you select only 2 NAT gateways, but launch private subnets into 3 availability zones, all will get a route to a NAT gateway, preferring the NAT gateway in the same AZ if available.

Learn more about VPC in a box here.

 

Posted in General

Elegant CI/CD with ECS

As the industry moves toward containers as a deployment artifact, we need to modify our deployment pipelines. One of our clients, Blast Motion, is a great example of how this can be accomplished. Prior to containers, Blast leveraged AWS Elastic Beanstalk and integrated that with their Team City server for a simple but effective CI/CD pipeline and DevOps friendly platform. To minimize the amount of customization of Elastic Beanstalk, Blast is migrating from native Beanstalk .NET environments to .NET Core container based micro-services. As they continued to scale, a more elegant container based operations model was needed. Their existing CI/CD server made branch based deployments difficult across environments, which slowed development velocity and increased deployment complexity. In order to meet the new requirements, Foghorn helped design and build a new pipeline leveraging GitHub, Travis CI, Docker, and AWS’ ECR and ECS. The AWS infrastructure was all deployed via CloudFormation to quickly carve out new environments.

By leveraging managed services, there are no servers to manage, either to support the CI/CD pipeline or to run the application. A simple, low cost solution that supports Blast and their need for high availability and agility.

Posted in General

App Design drives easy CI / CD options

At Foghorn, we’ve helped design and develop highly complex DevOps solutions for customers. But sometimes, a simple workflow works very well.

This was the approach we took with ClearFactr.  ClearFactr is a cloud native alternative to excel, specifically design for time series data analysis.  Dean Zarras, founder of ClearFactr, architected the application with cloud best practices in mind, which often makes it possible to leverage an off the shelf DevOps tool like Elastic Beanstalk.  Integrating Elastic Beanstalk and Github with Jenkins is relatively straightforward, with many examples on the web.

The result? All the benefits of a CI / CD environment, with almost no infrastructure and very little integration code to manage.  This is one of the benefits to carefully architecting a cloud native application.

Posted in General

Terraform with Azure? Sure!

While it is possible to use a proprietary Azure resource manager template in JSON format to define your Azure infrastructure or an AWS JSON CloudFormation template to define AWS infrastructure, using a tool like Terraform allows your company to standardize on one coding language for all your cloud infrastructure.  Terraform supports many providers including AWS, Google Cloud, and Microsoft Azure.  You can define, document, and roll out infrastructure to multiple providers with one set of configuration files.  However, this post will focus on how you can quickly define your Azure infrastructure with Terraform utilizing sample code for a typical deployment.

Terraform Resource names are unique across all providers so each defined resource knows which cloud is the intended destination.  For example, if you already have terraform code for your AWS infrastructure, you could add cloud redundancy or disaster recovery by adding an Azure provider with some Azure resources.   You do not have to separate your resources for different providers since all resource names are unique.

Terraform supports both the new Azure resource manager API as well as the classic Azure service management API.  I would recommend utilizing the Azure resource manager provider since this is the method of the future for Azure.  Although Terraform does not support all Azure resources, I found that it supports enough to deploy the majority of base infrastructure.

Below is a sample Azure infrastructure configured with a web tier, application tier, data tier, an infrastructure subnet, a management subnet, as well as a VPN gateway providing access the corporate network.  I was able to deploy all of this infrastructure besides the VPN gateway with Terraform resources.  The resources needed to deploy an Azure VPN gateway appear almost ready according to an open Hashicorp ticket.  By the time you read this, they may be available.  The resources will be called “asurerm_virtual_network_gateway” and “azure_virtual_network_gateway_connection.”

 

While not required, typically the first step to deploying your Azure Terraform infrastructure is do declare variables:


variable "subscription_id" {}
variable "client_id" {}
variable "client_secret" {}
variable "tenant_id" {}
variable "resource_group_id" {}
variable "resource_group_name" {}
variable "region" {
  default = "West US"
}
variable “VnetId” {}
variable “VnetName” {}
variable “EnvironmentTag” {}
variable “AdminUser” {}
variable “AdminPW” {}
variable “ServerName” {}

The first step is to supply the authentication required to address the Azure API by defining a Microsoft Azure Provider including subscription id, client id, client secret, and tenant id as defined in the variables.  


provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"
}

After providing access, the Azure resource group is typically defined to contain the remaining resources.  An Azure resource group typically defines a set of infrastructure resources that will share the same lifecycle which means they will be deployed, updated, and ultimately deleted at the same time.  Resources that do not share the same lifecycle should be deployed in a separate resource group.  The resource group variables and the region variable are used to define the id, name and region for the Azure resources.


resource "azurerm_resource_group" "${var.resource_group_id}" {
    name     = "${var.resource_group_name}"
    location = "${var.region}"
}

Once you have your resource group, we can now define all of the resources for your base infrastructure.  A virtual network (VNet) is the base technology for your own isolated network in the cloud, so it should be defined first as well as it subnets.


resource "azurerm_virtual_network" "${var.VnetId}" {
  name                = "${var.VnetName}"
  address_space       = ["10.1.0.0/16"]
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  subnet {
    name           = "MgmtSubnet"
    address_prefix = "10.1.32.0/19"
  }
  subnet {
    name           = "WebSubnet"
    address_prefix = "10.1.64.0/19"
    security_group = "${azurerm_network_security_group.WebAccess.id}"
  }
  subnet {
    name           = "AppSubnet"
    address_prefix = "10.1.96.0/19"
  }
  subnet {
    name           = "DataSubnet"
    address_prefix = "10.1.128.0/19"
  }
  subnet {
    name           = "ADDSSubnet"
    address_prefix = "10.1.160.0/19"
  }
  subnet {
    name           = "GatewaySubnet"
    address_prefix = "10.1.255.224/27"

  }
}

 

Azure provide basic routing between subnets, the Internet and across a configured VPN gateway automatically with their default system routes.  User-defined routes are only necessary if you want to alter the default behavior to force traffic to a virtual appliance or force tunneling to the Internet across the VPN gateway.  Below is a sample user-defined route.


resource "azurerm_route_table" "ADDS_Subnet" {
  name                = "AD DS Routing"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  route {
    name           = "routeVG"
    address_prefix = "10.1.160.0/19"
    next_hop_type  = "VirtualNetworkGateway
  }
  tags {
    environment = “EnvironmentTag” {}
  }
}

Azure network security groups (NSG) contain a list of security rules that deny or allow traffic to resources in an Azure virtual network.  NSGs can be associated with subnets or network interfaces attached to VMs.  Inbound traffic is first examined by NSGs attached to the subnet and then by a NSG attached to the NIC.  Outbound traffic is examined by the NSGs in reverse order.  Here is a sample NSG resource:


resource "azurerm_network_security_group" "WebAccess" {
  name                = "Web Access"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  security_rule {
    name                       = "Allow80"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "80"
    source_address_prefix      = "Internet"
    destination_address_prefix = "*"
  }
  tags {
    environment = “EnvironmentTag” {}  
  }
}

Microsoft Azure provides redundancy for multiple servers providing the same service with Availability Sets.  Azure hardware clusters are divided into update domains and fault domains representing update cycles and physical infrastructure including power and networking.  Azure will distribute virtual machines within an availability set across domains to provide availability and fault tolerance.  For the sample design above, we will need four availability sets.  Here is an example for the web tier:


resource "azurerm_availability_set" "WebTierAS" {
  name                = "Web Tier Availability Set"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  tags {
    environment = “${var.EnvironmentTag}"
  }
}

An Azure storage account provides storage within your Azure infrastructure.  Your storage account creates a unique namespace to store and retrieve your data.


resource "azurerm_storage_account" "StorIT" {
  name                = "StorageAccount"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  location            = "${var.region}"
  account_type        = "Standard_LRS"
  tags {
    environment = “EnvironmentTag” {}
  }
}

resource "azurerm_storage_container" "ContainIT" {
  name                  = "vhds"
  resource_group_name   = "${azurerm_resource_group."${var.resource_group_id}".name}"
  storage_account_name  = "${azurerm_storage_account.StorIT.name}"
  container_access_type = "private"
}

Below is a sample virtual machine definition.  A network interface is defined and assigned to the virtual machine.  The virtual machine definition also specifies the VM size, the operating system with user information, and the storage account/container.


resource "azurerm_network_interface" "webserver1nic" {
  name                = "WebServer"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  ip_configuration {
    name                          = "webIP1"
    subnet_id                     = "${azurerm_subnet. WebSubnet.id}"
    private_ip_address_allocation = "dynamic"
  }
}

resource "azurerm_virtual_machine" "webserver1" {
  name                  = "webserver1"
  location              = "${var.region}"
  resource_group_name   = "${azurerm_resource_group."${var.resource_group_id}".name}"
  network_interface_ids = ["${azurerm_network_interface.webserver1nic.id}"]
  vm_size               = "Standard_A1"
  storage_image_reference {
    publisher = "MicrosoftWindowsServer"
    offer     = "WindowsServer"
    sku       = "2012-R2-Datacenter"
    version   = "latest"
  }
  storage_os_disk {
    name          = "myOSdisk"
    vhd_uri       = "${azurerm_storage_account.StorIT.primary_blob_endpoint}${azurerm_storage_          container.ContainIT.name}/myOSdisk.vhd"
    caching       = "ReadWrite"
    create_option = "FromImage"
  }
  os_profile {
    computer_name  = "${var.ServerName}"
    admin_username = "${var.AdminUser}"
    admin_password = "${var.AdminPW}"
  }
  tags {
    environment = “${var.EnvironmentTag}"
  }
}

Azure has three separate services for distributing traffic including the Azure Load Balancer which works at layer 4, an Application Gateway which operates at layer 7 and acts as a reverse-proxy, and Traffic Manager which work at the DNS level.  This example template is deploying the Azure Load Balancer.  There is an external load balancer for the web tier and internal load balancers for the application/data tiers.  Below is a sample of code for the web tier including a public IP, the load balancer, front-end configuration, a web balancing rule, a health check probe, and a backend web server address pool.


resource "azurerm_public_ip" "WebTierLBIP" {
  name                         = "PublicIPForLB"
  location                     = "${var.region}"
  resource_group_name          = "${azurerm_resource_group."${var.resource_group_id}".name}"
  public_ip_address_allocation = "static"
}
resource "azurerm_lb" "WebTierLB" {
  name                = "WebTierLoadBalancer"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  frontend_ip_configuration {
    name                 = "WebTierIP"
    public_ip_address_id = "${azurerm_public_ip.WebTierLBIP.id}"
  }
}
resource "azurerm_lb_rule" "WebRule" {
  resource_group_name            = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id                = "${azurerm_lb.WebTierLB.id}"
  name                           = "WebLBRule"
  protocol                       = "Tcp"
  frontend_port                  = 80
  backend_port                   = 80
  frontend_ip_configuration_name = "WebTierIP"
}
resource "azurerm_lb_probe" "WebCheck" {
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id     = "${azurerm_lb.WebTierLB.id}"
  name                = "Port80responce"
  port                = 80
}
resource "azurerm_lb_backend_address_pool" "WebServers {
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id     = "${azurerm_lb.WebTierLB.id}"
  name                = "BackEndWebServers"
}

While not all Azure resources may be deployed with Terraform, many of the important ones are available including several not covered here.  There are definitely enough to supply your base infrastructure in Azure with or without base infrastructure for other providers as well.

 

The power of infrastructure-as-code really comes to life with a tool such as terraform especially with the potential of deploying infrastructure to multiple providers.  Utilizing terraform code similar to what I have shown in this post, you can quickly deploy an Azure resource group with a virtual network, route tables, network security groups, storage accounts, availability sets, virtual machines, and load balancers.  If you already have Terraform code for another provider, deploying Azure resources as shown in this post will allow you to quickly deploy a multi-cloud infrastructure.

 

Posted in General

Automating Cloud Passage firewall management

Halo, by Cloud Passage, is a great tool to ensure compliance across a hybrid infrastructure.
When it comes to security, some organizations have made incredible strides to streamline and
automate. But for those organizations who may not employ fully automated security, there are
often issues managing user access within services outside of LDAP (Lightweight Directory
Access Protocol). Manually configuring firewalls is tedious, not to mention inefficient. Thankfully,
CloudPassage has an API that alleviates this issue within our Python SDK.

The Python SDK allows you to easily write a script and bake Halo right into the plumbing of your
automation system. Essentially this makes it possible to do almost anything you need to do with
Halo, without having to use your browser. And as as we mentioned before, it’s such a pain to
manually configure firewall rules.

That being said, leaving things open to the world, and trusting SSH or RDP to not let the bad
guys in isn’t an option. We need firewall orchestration, but no one wants to give themselves
carpal tunnel spending day after day configuring their workloads. Time is money, right?
With Halo you have the capability to dynamically authorize specific users based on their IP to
connect to your servers through GhostPorts, and that’s a great way to enable granular security compliance and auditing.
Let me explain. Each policy under which these rules are defined and can have a GhostPorts
user added, which is a user that when registered under your account, can have access to
specific ports on workloads secured by multi-factor authentication.
I wanted to create a tool that allows me to create these firewall policies, with GhostPorts users,
using only the Halo API. This was made easier with the use of the Python SDK, which allows us
to write logic against Halo functionality without dealing with the minutia of remembering URLs or
handling authentication with Halo.

THE NITTY GRITTY
One of the main design ideas behind the Halo Python SDK is to have a single object that holds
everything the SDK needs to know in order to interact with Halo, and pass it into different parts
of the SDK. An analogy would be using a car key to unlock doors, open trunk, open glove box,
or start the car.
This script enables you to add a number of users to various groups of matching policies. Each
enabled group with these rules is looped through each specified user name and then looped
through each matching service name that is verified as an existing service and applied. This
allows multiple users that need to be added to multiple policies with multiple services to be
added quickly and securely (rules can be added as inactive by default on a per group basis or
for all groups) on a large scale, and can also be set to inactive and removed from all policies as well.

SCRIPT CONFIGURATION
The purpose of the script is to have a number of users be added to various “groups” of matching policies. A user like “bill@example.com” can be added to every policy that has the word “qa” in it, or that exactly match “qa” by setting the wildcard setting to true or false with any number of specified services. Settings are loaded via a groups.yaml in the same path as the script, although this could fairly easily be modified to be a command line argument instead to load whatever given yaml file location is specified. Each enabled group with these rules is looped through for each specified user name and then loops through each matching service name that is verified as an existing service and applied. This allows multiple users that need to be added to multiple policies with multiple services able to be added quickly and safely (rules can be set to be added as inactive by default on a per group basis or for all groups) on a large scale, and can also be set to inactive and removed from all policies as well. Below you’ll see an example of the yaml file output that contains these values, and after I’ll give a brief overview the script style and common classes used after these parameters are passed to the script:


groups:
  enabled: group1
  comment:
  usernames: foo@bar.com, bar@foo.com
  # How many to subtract from the last position
  subtractfromlastrule:

group1:
  name: Prod content hosts
  chain: INPUT
  active: False
  source: None
  destination: None
  states:
  action: ACCEPT
  services: ssh, https
  log: False
  log_prefix:
  comment:
  username:
  wildcard: False
  position:
  subtractfromlastrule:

group2:
  name: foo
  chain: INPUT
  active: False
  source:
  destination:
  states: NEW, ESTABLISHED
  services: cp-ssh
  action: ACCEPT
  log: false
  log_prefix:
  comment:
  username:
  wildcard: True
  position:
  subtractfromlastrule:

WE MUST GO DEEPER: CODE EXPLORATION

There are a few classes to take note of, specifically within firewally_policy.py[5] and http_helper.py[6]. These classes, “FirewallPolicy,” “FirewallRule,” and “HttpHelper,” are all called through the HaloSession instantiation that grabs the required token grabbed by passing along your API key and secret, ideally within its own class function that can be called for use later on. HttpHelper is needed in order to parse the v2 users api as mentioned in the API Guide, and is not currently included within the SDK due to being subject to change.

Everything within the halo portal has an id associated with it, from users to services to policies to rules, and can be obtained through the “list_all” function. It mostly just boils down to a standard fizzbuzz problem, as in create lists of filtered rules and policies based on names, if rule url is in policy then create a dictionary list with policy rules, etc. etc.

With all this in mind, eventually you’ll have a dictionary of policy ids and required json values to pass along that looks something like this:


for pol, pos in self.pol_positions.iteritems():
    for service in self.list_of_services:
        if service["name"] == self.service:
            policy_json = {
            'firewall_rule': {'chain': self.chain, 'firewall_source': {'id': self.user_id, 'type': 'User'},
            'active': self.active, 'firewall_service': service["id"],
            'connection_states': self.states, 'action': self.action, 'log': self.log,
            'log_prefix': self.log_prefix, 'comment': self.comment, 'position': pos}}

and be ready to start updating firewall policies through python! To learn more about this process, and to use it in your own environment, visit the GitHub page. All in all, the Python SDK makes managing firewalls within Halo a breeze. While your script runs, you can feel free to go about your business managing workload issues while not being slowed down with manual configuration.

Posted in General

Battle of the PaaS. AWS vs Google Cloud Platform

In the wake of Google’s Next ’17 event and the AWS San Francisco Summit 2017, I wanted to continue the AWS vs GCP comparison series.  In this part though, I wanted to focus on each public cloud vendor’s Platform as a Service (PaaS).  In basic terms, AWS’ Elastic Beanstalk and Google’s App Engine enable development teams to deploy their application without the need to understand how to build or scale the underlying infrastructure.  The promise is great, and for most use cases, that promise delivers.  More advanced applications requiring more complex integrations and customizations may require users to leave the PaaS in favor of a different platform.  But let’s get back to Elastic Beanstalk vs. App Engine.

Application Support

The first component to compare is what platforms each PaaS supports:

Elastic Beanstalk
App Engine
Java

PHP

.NET

Node.js

Python

Ruby

Go

Java

PHP

.NET / C#

Node.js

Python

Ruby

Go

That was easy, they both support pretty much the same languages.  What about custom environments?

Elastic Beanstalk
App Engine
Preconfigured Docker

Single Container Docker

Custom (dockerfile)
Multi Container Docker Multiple Services
Custom (AMI)

While I think for customers using containers it would make more sense to focus on ECS vs. GKE, that comparison is coming in a future post.  The take away from the PaaS custom environments is that Elastic Beanstalk supports the use of a custom AMI created with a tool Foghorn uses often, Packer.  This provides some flexibility as customers who are comfortable using Elastic Beanstalk can branch out of the standard application support listed above (if needed).  Take care though, Elastic Beanstalk and Custom AMIs can increase build times and introduce more complications, like prepping the AMI to be used by Elastic Beanstalk.

Enough with the basics and how they are similar, let’s turn our focus to how they are different.

Performance

I am not talking about how fast your app will run, that introduces a multitude of possibilities that have nothing to do with the platform you are running on.  I am specifically focusing on how fast the PaaS can build and scale.  The first part in understanding the speed of Elastic Beanstalk is to understand what the platform is built on.  In basic terms, Elastic Beanstalk uses CloudFormation, Auto Scaling Groups, custom code to tie those components to the application stack, a user interface and CLI, and lastly a deployment agent to push your code.  The fact that Elastic Beanstalk uses these components means that it is bound to their characteristics.  Some changes that you make may require a new EC2 instance, which means your seemingly simple change will actually:

  • Update CloudFormation
  • Create new Launch Configuration
  • Update Auto Scaling Group
  • Terminate Instance(s)
  • Provision Instance(s)
  • Run Elastic Beanstalk Code
  • Deploy your application

You could easily be waiting for 5-10 minutes for this to happen. This is a critical component to understand of Elastic Beanstalk.  Some changes require the full lifecycle to trigger which can take significant time.  Ok, not as long as ordering hardware for the data center.

App Engine on the other hand does not visibly have the same dependency structure on other services.  In addition, while you have a user interface, it is really a dashboard not a full configuration interface.  Interacting with App Engine is more like using the Elastic Beanstalk CLI.  You define your application in an App Engine app.yaml file in your source code.  For example, here is the Go hello world file:

runtime: go
api_version: go1


handlers:
- url: /.*  
  script: _go_app

This is about the absolute minimum to use the default settings, which is likely fine as you are getting started.  And this is also the appropriate time to mention one of the best features of App Engine.  You can terminate your instances to reduce costs, and as soon as you are ready to use the application again, App Engine will provision an instance and fire up your application on the first request.  While in Elastic Beanstalk you could create scheduled auto scaling to reduce your environment to zero instances and back up to one before your day starts, the on-demand activation of App Engine is quite nice.  Not to mention we are talking about seconds (or perhaps even less than a second) for App Engine vs. minutes (or perhaps as many as 10 minutes) for Elastic Beanstalk.

UI

The App Engine app.yaml is akin to the Elastic Beanstalk CLI eb create command where you can define everything in a single action.

$ eb create dev-vpc --vpc.id vpc-0ce8dd99 --vpc.elbsubnets subnet-b356d7c6,subnet-02f74b0c --vpc.ec2subnets subnet-0bb7f0cd,subnet-3b6697c1 --vpc.securitygroup sg-70cff265
Creating application version archive "app-160312_014309".
Uploading test/app-160312_014309.zip to S3. This may take a while.
Upload Complete.
Environment details for: dev-vpc
  Application name: test
  Region: us-east-1
  Deployed Version: app-160312_014309
  Environment ID: e-pqkcip3mns
  Platform: 64bit Amazon Linux 2015.09 v2.0.8 running Java 8
  Tier: WebServer-Standard
  CNAME: UNKNOWN
  Updated: 2016-03-12 01:43:14.057000+00:00
Printing Status:
...

And like the App Engine app.yaml, you can also provide configuration information for your Elastic Beanstalk environment using .ebextensions:

~/workspace/my-app/
|-- .ebextensions
|   |-- environmentvariables.config
|   `-- healthcheckurl.config
|-- .elasticbeanstalk
|   `-- config.yml
|-- index.php
`-- styles.css

What AWS has provided is a rich Console GUI to enable users to build, clone and manage (most) settings.  While the goal in both platforms is to meet the developer at their source control repository, this extensive GUI may be a welcome feature for some users.

Deployments

Another cool feature that App Engine supports that is not really replicated in Elastic Beanstalk is their traffic splitting.  Whereas Elastic Beanstalk does support the idea of a Blue/Green deployment, the end result is a DNS cutover (or rollback).  Traffic splitting inside App Engine allows you to specify a percentage of traffic to go to a different version of the service.  This is closer to a canary style deployment and may be a big advantage for some applications.  This deployment style can be replicated in Elastic Beanstalk by using multiple environments and Route53 weighted DNS entries.

Overall

What about these PaaS’ in the real world?  Personally I prefer the simplicity and fast changes of App Engine.  I have found the random times to deploy changes within Elastic Beanstalk to be tedious in daily use.  That said, if your use case does not require many environment changes and little to know configuration management, Elastic Beanstalk is probably a really easy way to get your app stood up on AWS.  I also like that App Engine can basically be shut down by terminating your running instances (for development and testing use cases).  You can’t really “delete” Elastic Beanstalk unless you want to remove the entire application environment.

I think Elastic Beanstalk is a solid PaaS.  For users who want an intuitive GUI that allows for easy configuration, Elastic Beanstalk wins. That said, I personally have spent significant time to get the desired results out of it when an app requires complex configuration not available in the GUI.  There is a certain degree of mystery (though the same can be said for App Engine) that in some circumstances becomes a hindrance to progress.  I would prefer to use my own code to control the underlying resources (Launch Configuration, Auto Scaling Groups, RDS, Security Groups, etc.), where I have a clear understanding of all the code running, and perhaps more importantly, I am finding the fastest infrastructure deployment model to get changes out quickly and effectively.

Next up in the GCP vs AWS match up is something I am particularly interested in sharing.  How GCP project level boundaries can solve a challenging security paradigm in AWS.

Posted in AWS, Cloud Management, GCP, Public Cloud

AWS and GCP, Account vs Project Boundaries

 

One of the most interesting differences between GCP and AWS is how each vendor recommends you isolate the blast radius of functional teams.  AWS will tell you that in all likelihood, you will need at least two accounts, but possibly more.  Referring to their white paper on the subject, you can see the questions they pose to decide if you need multiple accounts:

  • Does the business require administrative isolation between workloads?
  • Does the business require limited visibility and discoverability of workloads?
  • Does the business require isolation to minimize blast radius?
  • Does the business require strong isolation of recovery and/or auditing data?

While on the GCP side of things, best practice is to use Projects to provide a functional grouping of resources.  In fact, much of the content I intended to focus on with this blog post was recently provided by Google in this excellent writeup.  I strongly recommend anyone reading this blog post to also read that writeup, which was referenced earlier this month via this blog post by Google.  So who got it right?

The truth is, except for a few specific areas, in my opinion the two systems deliver equal functionality.  Accurate cost allocation can be done in either system.  Projects and AWS Accounts are global, not regionally, bound; they can service many users in many regions with many services.  So on and so forth.  There are two key areas however where I think Google has a distinct advantage:

Networking

The biggest reason why a GCP Project is such an effective model is at the network level.  Within AWS, the VPC is isolated to a region and an account.  In order to connect other AWS VPCs within the same region, you can use peering, but that requires a well planned and scaled approach on how a fully meshed intra-region multi Account / multi-VPC network would look like and scale without any IP overlap.  And all of this assumes that the customer is ok with potentially 100’s of accounts and 100’s of networks (or make that 100s x 2 since we should isolate production in its own account).  Some customers simply do not want to manage all these accounts and networks.  This is especially true when those networks must connect back to a corporate location or data center.

GCP on the other hand does not require that each Project have its own network.  In fact, you could create one large development network within GCP (global or regional) and then create unique development Projects to isolate resources for each development team.  The fact that all these resources are in the same network does not diminish the security that the Project boundary creates.  This makes things like shared services and security auditing tools much easier to orchestrate since they can have cross Project access.  Not to mention the network does not require a complicated meshed structure.  The network could be created based on routing principals rather than development teams, simplifying the use for both the network administrators and the developers using the network.

I can’t overstate how significant the Project feature is within GCP.  In my opinion, this is a much cleaner solution to isolating access than anything AWS currently provides.  Having had numerous conversations with AWS customers explaining the difficulties providing resource based access control within the same account and network, to have such a simple solution on GCP is excellent.

User Access

The second reason I think Google got it right on Projects is how you assign user access.  You can simply create a Google Group for a GCP Project.  Add users to the Google Group (within your organization, or maybe even outside your organization like 3rd party contractors or consultants).  By contrast, there is no equivalent user grouping structure on the AWS side.  You could use an Identity account to consolidate user management and then use cross-account access with roles.  This would enable you to manage which users can access which accounts, but this is significantly more complicated than using a Google Group.

Stay tuned for how to manage a large scale deployment or projects or AWS accounts in a future blog post.

 

 

Posted in AWS, Cloud, Cloud Management, GCP, Public Cloud

Real World Cost Example for Google and AWS

In the wake of Google’s Next ’17 event, and a slew of recent Reserved Instance changes by Amazon Web Services (AWS), it seemed appropriate to compare to see if anything in the public cloud VM pricing has changed, or perhaps more importantly, who is ultimately cheaper?

First things first, Google Cloud Platform (GCP) announced at Google Next ’17 the ability to reserve capacity in the form of Committed Use Discounts.  One significant difference between these and AWS Reserved Instances is the fact that GCP does not require any upfront payment for the best discount.  Where as with AWS you would need to pay an upfront fee and a 3 year commitment to receive up to 60% off on demand pricing, with GCP committed use you receive up to 57% with no up front fee.  The GCP discount most closely aligns with AWS’ new Convertible Reserved Instances.  They are regional, not tied to any specific zone within a region (in both GCP and AWS cases).  In the case of AWS, these discounts can be modified for instance type, family, OS, and tenancy.  In the case of GCP, the discount is on your aggregate core count for the region.  What makes GCP’s discount better, and keeping in line with their Sustained Use Discounts, you don’t have to modify anything to get your discount.  It applies to whatever you have running, no need to make the discount fit the infrastructure.

AWS has made a few recent changes in Reserved Instances to help make them more useful to customers.  These are welcome changes that offer increased flexibility for customers to tailer their discounts.  A few changes in how your AWS RI’s can be put to work.  Scheduled Reserved Instances allows customers to get the RI discount and capacity reservation for periodic workloads.  For customers who are willing to waive capacity reservation in exchange for more RI discount flexibility, they can now opt for regional RI’s.  Lastly, as mentioned previously, AWS customers can now modify the RI to meet changing needs through instance size flexibility.

What about those people who just want to use On Demand?  GCP has a clear advantage with Sustained Use Discounts provided your instances are running for more than 25% of the month (not an uncommon occurrence).  AWS does not provide a comparative feature.  Refer to Google’s sustained use discount chart:

So let’s take some actual compute based pricing scenarios to see how all these options play out.  To keep things as easy as possible to compare, we will assume these instances are running 24 hours a day for the entire month.  So all costs are monthly.  All prices and discounts are as of March 20th, 2017:

Description Resources Discounts Approx Monthly Total
4 vCPU
15 GB memory
10 GB SSD storage
Quantity: 10
AWS = m4.xlarge
GCP = n1.standard-4
AWS = N/A
GCP = Sustained Use 30%
AWS = $1,585

GCP = $988

4 vCPU

15 GB memory

10 GB SSD storage

Quantity: 10

AWS = m4.xlarge

GCP = n1.standard-4

AWS = 1 year No Upfront RI

GCP = Sustained Use 30%

AWS = $1,086

GCP = $988

4 vCPU

15 GB memory

10 GB SSD storage

Quantity: 10

AWS = m4.xlarge

GCP = n1.standard-4

AWS = 3 year No Upfront Convertible RI

GCP = Sustained Use 30%

AWS = $974

GCP = $988

4 vCPU

15 GB memory

10 GB SSD storage

Quantity: 10

AWS = m4.xlarge

GCP = n1.standard-4

AWS = 3 year No Upfront Convertible RI

GCP = Committed Use Discount 1 year

AWS = $974

GCP = $874

4 vCPU

15 GB memory

10 GB SSD storage

Quantity: 10

AWS = m4.xlarge

GCP = n1.standard-4

AWS = 3 year No Upfront Convertible RI

GCP = Committed Use Discount 3 year

AWS = $974

GCP = $624

All that said, AWS does offer something GCP does not, and that is up front reserved instance discounts.  To continue the above workload, let’s add one more pricing scenario.  In this case, we will use the 3 year All Upfront Convertible RI discount from AWS.

Description
Resources
Discount(s)
3 Year Total
4 vCPU

15 GB memory

10 GB SSD storage

Quantity: 10

AWS = m4.xlarge

GCP = n1.standard-4

AWS = 3 year All Upfront Convertible RI

GCP = Committed Use Discount 3 year

AWS = $29,472

GCP = $22,464

Monthly Amortized AWS = $819

GCP = $624

Speaking of flexibility, one critical difference between GCP and AWS compute resources is the ability to use a custom machine type on GCP.  This allows the customer to select a blend of compute and memory that better suits their needs.  Furthermore, this custom VM is in many cases cheaper than you would need to pay on AWS for the closest match.  The following chart shows some examples of how this would play out in an on demand manor:

Description
Resources
Cores
Memory
Approximate Monthly Total
Compute Workload Option 1

Maximum Cores, Minimum Memory

10GB SSD storage

Quantity: 100

12 hours a day / 5 days a week

AWS = c4.8xlarge

GCP = Custom

AWS = 36

GCP = 36

AWS = 60

GCP = 32

AWS = $41,148

GCP = $32,798

Compute Workload Option 2

Maximum Cores, Minimum Memory

10GB SSD storage

Quantity: 100

12 hours a day / 5 days a week

AWS = m4.16xlarge

GCP = Custom

AWS = 64

GCP = 64

AWS = 256

GCP = 58

AWS = $89,033

GCP = $58,478

Perhaps most notable though in pricing for elastic workloads, on GCP you pay per minute, not per hour.
One closing thought around core compute performance.  The boot times for linux inside GCP is faster than on AWS.  This foundational difference has a lot of follow on benefits.  Faster auto-scaling as one example.  In the case of PaaS, it even means on request instances for non production environments (no traffic, no compute running).  More on scaling, performance and PaaS in my next post.

Posted in AWS, Cloud, Cost Optimization, GCP

New APIs expand the use for AWS Tags

Amazon recently announced some new features around tagging permissions that make tags considerably more useful.  Although this just came out, I already see a few areas where we can simplify automation scripts. More importantly, since we can limit access to tags by key, this allows us to reserve certain keys for central functions like cost allocation and monitoring, while allowing individual teams to still leverage tags for other purposes without the risk of production required tags being modified.

Resource level permissions are still not to the point where complete isolation of resources for different teams can be implemented in a single account, but this is a huge step forward in enabling developers the access they need without the risk of breaking production automation.

How will you use these new features? Reply below!

Posted in AWS, Cloud, Cloud Management

HIPAA on AWS the Easy Way

Many of our customers are running workloads that are subject to HIPAA regulations.  Running these on AWS is definitely doable, but there are some catches.  Foghorn has made it super easy for our customers to run HIPAA compliant workloads on AWS. Here’s how..

What is a BAA?

If you are not familiar with HIPAA, the regulations require a Business Associate Agreement to be executed with each of your partners who may have access to Protected Health Information.  From the Health Information Privacy page on BAA:

‘A “business associate” is a person or entity, other than a member of the workforce of a covered entity, who performs functions or activities on behalf of, or provides certain services to, a covered entity that involve access by the business associate to protected health information.  A “business associate” also is a subcontractor that creates, receives, maintains, or transmits protected health information on behalf of another business associate.  The HIPAA Rules generally require that covered entities and business associates enter into contracts with their business associates to ensure that the business associates will appropriately safeguard protected health information. ‘

AWS HIPAA Rules and Regs

If you are handling PHI today, you already know that any vendor that you share PHI with is required to sign a BAA.  Amazon has made this process pretty straightforward, in that they offer a BAA that they will happily sign for all customers storing and processing PHI on AWS.  But the devil is in the details.  You can read more at the AWS HIPAA compliance page here.  The important quote:

“Customers may use any AWS service in an account designated as a HIPAA account, but they should only process, store and transmit PHI in the HIPAA-eligible services defined in the BAA.”

So are you protected by the BAA?

The BAA that Amazon signs covers only a few of the AWS services, and requires that you use those services in specific architectural configurations.  If you break those conventions, the BAA is nullified.  Worse, it is nullified for your entire account, not just for data handled by the non-compliant components.

An easy example would be if your team had a compliant architecture for production, but a non-compliant infrastructure for staging.  This may have been your configuration to save on costs, and in order to maintain compliance you scrub staging data of PHI before uploading.  Let’s say that an engineer mistakenly uploaded non-scrubbed data to the non-compliant environment.  You just invalidated your BAA, even for your production environment!

In addition, any of the technical consultants, subcontractors, and managed services companies that you use also need to sign a BAA. This process can be time consuming and costly from a legal perspective.

The Easy Way

Foghorn is both a cloud services and a cloud engineering provider.  Because you get all of your AWS as well as your engineering and managed services from us, you can sign a single BAA with Foghorn.  All of the AWS gotchas still apply, but Foghorn is deeply experienced in architecting and managing HIPAA compliant environments.  By partnering with Foghorn, we can make sure your PHI is safe, and your company is protected from accidentally invalidating your AWS BAA.  There are a few ways we accomplish this:

  1. All Foghorn employees undergo HIPAA training.  We make sure our employees understand the what, the how and the why of HIPAA to avoid any simple errors.
  2. All Foghorn customer HIPAA accounts are tagged.  We know which accounts are HIPAA, and which aren’t, without a doubt.  That makes tracking and auditing easier.
  3. We segregate your PHI workloads from non PHI workloads when possible, to make sure we can focus the restrictive HIPAA based policies only where required. This saves cost and maintains agility on the rest of your workloads.
  4. We design the HIPAA infrastructure with belt and suspenders.  We make sure your architecture is compliant with Amazon’s BAA conditions, and add multiple layers of assurance.
  5. We advise and guide on the responsibilities that AWS does not take care of.  This includes scanning, penetration testing, change processes, incident response, etc.
  6. We set up realtime audit monitoring for key controls to make sure that in case someone changes something in your account that may lead to compliance issues, your team is notified immediately.

Call us today for more info on how we can help you meet HIPAA compliance while retaining your agility.

Posted in AWS, Cloud, Health Care, Public Cloud