Hands-on with AWS re:Invent 2017 Labs

I went to Las Vegas last month, and didn’t gamble once! Weird, right? But AWS re:Invent 2017 was so full of material that it doesn’t leave much free time for the usual Vegas experience, after the keynotes, talks, hackathons, daily after-parties, and sundry side activities. If you’re reading this and wondering whether you should go to re:invent 2018, the tl;dr is “Yes!”

I didn’t even attend any talks, and ended up watching the keynotes over livestream (see walking, later). My focus the entire time was the hands-on labs. The labs (or workshops) are all structured the same: scheduled for 2.5 hours, each starts with a presentation from an AWS trainer or expert, followed by self-paced study time, with additional AWS subject-matter experts on-hand to answer questions on the material. I focused on intermediate- or advanced-level workshops (300- or 400-level), where its assumed you’ll already have an account set up, know how to access the console, and manage your PEM files or configure the CLI.

These are not sandbox labs (like Qwiklabs or other online practice tools), so its important to have an AWS account of your own, and one where you can feel free to create and destroy resources with aplomb. Each lab does provide credits which cover the cost of running the lab (and after five such workshops, I built up a good buffer of billing credits!). Here’s the specific workshops which I attended (and one which I planned on but missed, SRV330):

CMP316 was my favorite. It gives a detailed, functioning example of using an autoscaling EC2 Spot fleet to run analysis on financial markets. The approach uses Jupyter notebooks, asynchronous SQS queueing, and Cloudwatch alarms, and then shows how AWS Batch can replace much of that with a managed-service offering.

SRV424 was good, but there’ an issue with the fourth lab, for which I opened an issue. I got to build my first Alexa skill in ABD325, and deploy a serverless application using Athena and Quicksight to analyze Twitter sentiment. Really all of the workshops were top-notch. If you already work with AWS professionally, I recommend the advanced-level workshops; the content is really solid and the AWS experts on-hand help with specific questions. The list above includes links to the lab materials on Github, which is awesome, since I could follow-up and read in more detail later. It also means you could work through them even if you didn’t attend reinvent. Get to it!

For the first time, re:invent was spread across multiple locations along the Vegas strip. Plently of discussions on Reddit and elsewhere abount long walking times. While there were shuttles between the locations, it often took 40-60 minutes to get from one venue to another. There were also issues with long lines if you hadn’t reserved a seat in a lab. And not getting in to the workshop you arrived at almost certainly meant missing out on any workshop for that time slot. If you’re planning to attend for the workshops, figure out in advance which ones you want and consider booking your hotel where those workshops will be given.

Workshops are meant as focused time for deep-dives. If you’re going to re:invent to network, they probably aren’t where you want to spend most of your time. But for getting familiar with a service like IoT Device Manager or CodeDeploy and CodePipeline, and maybe chatting with SMEs at other companies along the way, workshops are great.

The workshops leave a lot of resources lingering around in your AWS account. And as always, cost management is key. This gave me a great excuse to try AWS Nuke, a project whose very premise evokes fear in the minds of everyone I mention it to. Nuke will, as it says, blow away everything in an AWS account. There’s multiple layers of safety built in, and a way to whitelist items to not remove. But in the end, Nuke identified resources in S3 buckets, SageMaker managed services, Lambda functions, and various other items which would have persisted long afterwards, had I not obliterated them. After all, you don’t want to do account cleanup in between re:invent parties!

Posted in General

supporting Thorn’s campaign by protecting your AWS S3 resources

As alluded to in a previous blog post, We Saw. We Hacked. We Conquered. this is the follow up with the architectural details.

Thorn builds technology to defend children from sexual abuse with the goal of bringing this tech to every company that needs it. Their work focuses on helping companies detect and remove child abuse content from their platform. For the Non-Profit Hackathon we focused on how to make removing known child sexual abuse content easy for everyone using AWS and S3.

Our solution had to account for these requirements:

  • Easy to deploy
  • Easy to use
  • Inexpensive

Additionally, we wanted our solution to include the following features:

  • Support large video and image files
  • Support multiple notification subscribers
  • Fast malicious object detection
  • Automatically delete objects if enabled
  • Support scanning all new objects
  • Support configuring all new buckets
  • Support scanning any existing bucket objects
  • Support retroactive scanning when hashes change

So what did we build?


And how do you deploy it?

An AWS user launches a CloudFormation stack and provides some initial information as parameters:

  • Bucket regex (what bucket filter criteria you want to use, defaults to *)
  • File regex (what object filter criteria you want to use, defaults to *)
  • Scan buckets? (does the user want to retroactively scan all existing buckets or not, defaults to no)
  • Delete items? (does the user want to enable object deletion in addition to object notification, defaults to no)
  • Email address (the email distribution group to subscribe the notifications to)

There is also a Slack webhook that can be updated to support pushing notifications to this AWS account’s Slack channel, this requires editing some variables in that lambda python script.  If the user wishes to use SMS subscribers, they must launch in the us-east-1 region.

And what does it do?

I will describe the full feature state, though some of these workflows may not exist if the user elects to not scan buckets.  Once deployed, the retroactive bucket scan starts via a CloudWatch Event Rule for a cron entry that kicks off a Lambda .  Once this bucket scan is completed, the Lambda function disables the CloudWatch Event Rule.  This initial Lambda function that finds all the existing buckets then triggers two additional Lambda functions; the first is the configure bucket Lambda, the second is the scan bucket Lambda.

The configure bucket Lambda has a very basic function, it takes a bucket via an invocation input and enables an S3 notification event for object creation.  This event invokes the hash Lambda function.

The scan bucket lambda processes all the objects in the bucket and invokes the hash Lambda function.

The hash Lambda function is one of two main workflow streams where new object md5 hashes are created and stored in DynamoDB.  Finally the hash Lambda function invokes the validate hash Lambda function.

The validate hash Lambda function is the core engine in this workflow.  It queries DynamoDB, validates the hash against the known bad hashes and then notifies SNS (and deletes the object if enabled).

In addition the above retroactive workflow, our CloudFormation template also creates a CloudWatch Event Rule for new bucket creation which invokes the configure bucket Lambda function.  This way any newly created buckets after the initial CloudFormation deployment will be picked up into the scanning workflow.

And what do the notifications look like?

At the core, we wanted the notification payload to easily be consumed by any upstream service.  While we only integrated and demonstrated Email, SMS, and Slack, the JSON payload could very easily be consumed by a web service so custom applications could ingest and act on this information.

The basic email notification generates a message like the following:

The message includes a pre-signed URL to the S3 object for verification, this pre-signed URL expires for obvious reasons.

If Slack is configured, the Slack message looks like the following (with the same pre-signed URL structure):

Parting thoughts…

I felt compelled to call out how great an experience the hackathon was.  The day was long and the team was pretty much working nonstop up to the last 30 minutes before judging.  But to think that something you contributed to could positively influence change for something so critical as this is truly rewarding. Lastly, this was me working on this post at re:invent.

Posted in General

We Saw. We Hacked. We Conquered.

One of our customers, Ellie Mae, asked us to join them in the Not-Profit Hackathon at AWS re:Invent 2017, and we

jumped at the chance.  With Foghorn making up 1/2 the team, we were excited to spend 15 straight hours

with Ellie Mae designing, coding and presenting our solution to help Thorn combat predatory behavior.

In order to end up with a functioning demo in 13.5 hours, we had to run a tight ship, so we treated this like any mission critical customer project. After selecting the problem statement we chose to tackle, we dug in first on requirements before diving into solution mode.  Then we began to architect our solution.  Since we had 6 people, we needed 6 work streams to best work in parallel, and this definitely influenced our architecture.

With 5 Lambda functions and a CloudFormation template to build, we could all work in parallel until integration time.

Between  the Ellie Mae engineers and the Foghorn FogOps team, we had some serious rock stars with no real anchors, and after deciding on a rough architecture, we started blasting out code like it was going out of style.  The day was hectic, but not stressful.  As the guy with the CloudFormation piece, I had my hands full as issues here and there popped up and we needed more components, parameters, and environment variables for our serverless solution.  I laughed at the fact that I think I basically just typed CF code as fast I could for about 12 hours.

We finished our individual pieces before dinner and we were integration testing ahead of schedule. Amazingly, only a few minor bugs (mostly passing payload data between functions) needed to be fixed.  Plenty of time to add some extra features (Slack integration, etc.) as well as test our demo end-to-end several dozen times.  By the time the beer brake came we were comfortable with the presentation, the demo, and the committed code, and we took the last 45 minutes of hacking time to enjoy the free beer and pat ourselves on the back for finishing what we designed.

Then it got fun.

So we make it to the finals, and our fully functioning solution really impresses the judges.  We take home the gold, and I’m really proud of what we accomplished.  Funny, I took the whole thing like a game, but knowing that Thorn is going to put what we built into action at some point is really rewarding.  I’m looking forward to the next one!

I left out what it was that we actually built.  Why? I’m gonna let Ryan Fackett cover that one in his next post :).  I will, however, share a pic of the prize for winning. Pretty cool, huh?

Posted in General

Updates to VPC-in-a-Box

In case you aren’t familiar with the offering, VPC-in-a-Box is Foghorn’s best practice VPC design for Amazon Web Services; customized for client workloads, and delivered as re-usable, versionable code.  We’ve been delivering and iterating this service since 2014, and have improved the offering over the years.  We first added reverse proxy with a slick autoscaling squid proxy option.  Later we added S3 endpoints, cross region VPN connectivity, and NAT Gateways to replace our custom HA NAT server configuration.  For the first several years the offering was available exclusively via Cloudformation templates.

This year we’ve extended the offering to Terraform, and included VPC-in-a-box code as a free module for FogOps customers.  In addition, we’ve added tons of configurability around supernet and subnet sizing, all available via parameters. No custom coding needed, which means that clients who elect to standardize on our VPC module can rely on Foghorn to maintain the code, and benefit from future enhancements to the module.

The goal of this offering is to eliminate the ongoing management of one more piece of infrastructure that is not a business differentiator for our clients.  In order to make sure the offering lives up to this goal, Foghorn is continually tweaking the configuration, ensuring two main things:

  1. The design represents best practice for the current features available at Amazon Web Services.  As new features are introduced, we constantly evaluate the features, determine whether we should change our recommendations on best practice, and modify if necessary.  Since our modules are versioned, customers can upgrade at their leisure.
  2. The code is compliant with the most recent version and features of the Infrastructure as Code tool of choice.  As CloudFormation features are released, we update our code. Likewise with Terraform, we ensure that our modules are fully tested with every new release.

My favorite new feature is Terraform Workspaces.  Stay tuned for a post in the near future, where I’ll walk through how we are using workspaces to help DevOps and SRE teams to leverage a single code base to manage multiple environments, ensuring that staging looks like production, and DR looks like both of them!

Ryan’s favorite feature is the ability to simply set the count of NAT gateways desired.  If you select only 2 NAT gateways, but launch private subnets into 3 availability zones, all will get a route to a NAT gateway, preferring the NAT gateway in the same AZ if available.

Learn more about VPC in a box here.


Posted in General

Elegant CI/CD with ECS

As the industry moves toward containers as a deployment artifact, we need to modify our deployment pipelines. One of our clients, Blast Motion, is a great example of how this can be accomplished. Prior to containers, Blast leveraged AWS Elastic Beanstalk and integrated that with their Team City server for a simple but effective CI/CD pipeline and DevOps friendly platform. To minimize the amount of customization of Elastic Beanstalk, Blast is migrating from native Beanstalk .NET environments to .NET Core container based micro-services. As they continued to scale, a more elegant container based operations model was needed. Their existing CI/CD server made branch based deployments difficult across environments, which slowed development velocity and increased deployment complexity. In order to meet the new requirements, Foghorn helped design and build a new pipeline leveraging GitHub, Travis CI, Docker, and AWS’ ECR and ECS. The AWS infrastructure was all deployed via CloudFormation to quickly carve out new environments.

By leveraging managed services, there are no servers to manage, either to support the CI/CD pipeline or to run the application. A simple, low cost solution that supports Blast and their need for high availability and agility.

Posted in General

App Design drives easy CI / CD options

At Foghorn, we’ve helped design and develop highly complex DevOps solutions for customers. But sometimes, a simple workflow works very well.

This was the approach we took with ClearFactr.  ClearFactr is a cloud native alternative to excel, specifically design for time series data analysis.  Dean Zarras, founder of ClearFactr, architected the application with cloud best practices in mind, which often makes it possible to leverage an off the shelf DevOps tool like Elastic Beanstalk.  Integrating Elastic Beanstalk and Github with Jenkins is relatively straightforward, with many examples on the web.

The result? All the benefits of a CI / CD environment, with almost no infrastructure and very little integration code to manage.  This is one of the benefits to carefully architecting a cloud native application.

Posted in General

Terraform with Azure? Sure!

While it is possible to use a proprietary Azure resource manager template in JSON format to define your Azure infrastructure or an AWS JSON CloudFormation template to define AWS infrastructure, using a tool like Terraform allows your company to standardize on one coding language for all your cloud infrastructure.  Terraform supports many providers including AWS, Google Cloud, and Microsoft Azure.  You can define, document, and roll out infrastructure to multiple providers with one set of configuration files.  However, this post will focus on how you can quickly define your Azure infrastructure with Terraform utilizing sample code for a typical deployment.

Terraform Resource names are unique across all providers so each defined resource knows which cloud is the intended destination.  For example, if you already have terraform code for your AWS infrastructure, you could add cloud redundancy or disaster recovery by adding an Azure provider with some Azure resources.   You do not have to separate your resources for different providers since all resource names are unique.

Terraform supports both the new Azure resource manager API as well as the classic Azure service management API.  I would recommend utilizing the Azure resource manager provider since this is the method of the future for Azure.  Although Terraform does not support all Azure resources, I found that it supports enough to deploy the majority of base infrastructure.

Below is a sample Azure infrastructure configured with a web tier, application tier, data tier, an infrastructure subnet, a management subnet, as well as a VPN gateway providing access the corporate network.  I was able to deploy all of this infrastructure besides the VPN gateway with Terraform resources.  The resources needed to deploy an Azure VPN gateway appear almost ready according to an open Hashicorp ticket.  By the time you read this, they may be available.  The resources will be called “asurerm_virtual_network_gateway” and “azure_virtual_network_gateway_connection.”


While not required, typically the first step to deploying your Azure Terraform infrastructure is do declare variables:

variable "subscription_id" {}
variable "client_id" {}
variable "client_secret" {}
variable "tenant_id" {}
variable "resource_group_id" {}
variable "resource_group_name" {}
variable "region" {
  default = "West US"
variable “VnetId” {}
variable “VnetName” {}
variable “EnvironmentTag” {}
variable “AdminUser” {}
variable “AdminPW” {}
variable “ServerName” {}

The first step is to supply the authentication required to address the Azure API by defining a Microsoft Azure Provider including subscription id, client id, client secret, and tenant id as defined in the variables.  

provider "azurerm" {
  subscription_id = "${var.subscription_id}"
  client_id       = "${var.client_id}"
  client_secret   = "${var.client_secret}"
  tenant_id       = "${var.tenant_id}"

After providing access, the Azure resource group is typically defined to contain the remaining resources.  An Azure resource group typically defines a set of infrastructure resources that will share the same lifecycle which means they will be deployed, updated, and ultimately deleted at the same time.  Resources that do not share the same lifecycle should be deployed in a separate resource group.  The resource group variables and the region variable are used to define the id, name and region for the Azure resources.

resource "azurerm_resource_group" "${var.resource_group_id}" {
    name     = "${var.resource_group_name}"
    location = "${var.region}"

Once you have your resource group, we can now define all of the resources for your base infrastructure.  A virtual network (VNet) is the base technology for your own isolated network in the cloud, so it should be defined first as well as it subnets.

resource "azurerm_virtual_network" "${var.VnetId}" {
  name                = "${var.VnetName}"
  address_space       = [""]
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  subnet {
    name           = "MgmtSubnet"
    address_prefix = ""
  subnet {
    name           = "WebSubnet"
    address_prefix = ""
    security_group = "${azurerm_network_security_group.WebAccess.id}"
  subnet {
    name           = "AppSubnet"
    address_prefix = ""
  subnet {
    name           = "DataSubnet"
    address_prefix = ""
  subnet {
    name           = "ADDSSubnet"
    address_prefix = ""
  subnet {
    name           = "GatewaySubnet"
    address_prefix = ""



Azure provide basic routing between subnets, the Internet and across a configured VPN gateway automatically with their default system routes.  User-defined routes are only necessary if you want to alter the default behavior to force traffic to a virtual appliance or force tunneling to the Internet across the VPN gateway.  Below is a sample user-defined route.

resource "azurerm_route_table" "ADDS_Subnet" {
  name                = "AD DS Routing"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  route {
    name           = "routeVG"
    address_prefix = ""
    next_hop_type  = "VirtualNetworkGateway
  tags {
    environment = “EnvironmentTag” {}

Azure network security groups (NSG) contain a list of security rules that deny or allow traffic to resources in an Azure virtual network.  NSGs can be associated with subnets or network interfaces attached to VMs.  Inbound traffic is first examined by NSGs attached to the subnet and then by a NSG attached to the NIC.  Outbound traffic is examined by the NSGs in reverse order.  Here is a sample NSG resource:

resource "azurerm_network_security_group" "WebAccess" {
  name                = "Web Access"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_env}".name}"
  security_rule {
    name                       = "Allow80"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "80"
    source_address_prefix      = "Internet"
    destination_address_prefix = "*"
  tags {
    environment = “EnvironmentTag” {}  

Microsoft Azure provides redundancy for multiple servers providing the same service with Availability Sets.  Azure hardware clusters are divided into update domains and fault domains representing update cycles and physical infrastructure including power and networking.  Azure will distribute virtual machines within an availability set across domains to provide availability and fault tolerance.  For the sample design above, we will need four availability sets.  Here is an example for the web tier:

resource "azurerm_availability_set" "WebTierAS" {
  name                = "Web Tier Availability Set"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  tags {
    environment = “${var.EnvironmentTag}"

An Azure storage account provides storage within your Azure infrastructure.  Your storage account creates a unique namespace to store and retrieve your data.

resource "azurerm_storage_account" "StorIT" {
  name                = "StorageAccount"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  location            = "${var.region}"
  account_type        = "Standard_LRS"
  tags {
    environment = “EnvironmentTag” {}

resource "azurerm_storage_container" "ContainIT" {
  name                  = "vhds"
  resource_group_name   = "${azurerm_resource_group."${var.resource_group_id}".name}"
  storage_account_name  = "${azurerm_storage_account.StorIT.name}"
  container_access_type = "private"

Below is a sample virtual machine definition.  A network interface is defined and assigned to the virtual machine.  The virtual machine definition also specifies the VM size, the operating system with user information, and the storage account/container.

resource "azurerm_network_interface" "webserver1nic" {
  name                = "WebServer"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  ip_configuration {
    name                          = "webIP1"
    subnet_id                     = "${azurerm_subnet. WebSubnet.id}"
    private_ip_address_allocation = "dynamic"

resource "azurerm_virtual_machine" "webserver1" {
  name                  = "webserver1"
  location              = "${var.region}"
  resource_group_name   = "${azurerm_resource_group."${var.resource_group_id}".name}"
  network_interface_ids = ["${azurerm_network_interface.webserver1nic.id}"]
  vm_size               = "Standard_A1"
  storage_image_reference {
    publisher = "MicrosoftWindowsServer"
    offer     = "WindowsServer"
    sku       = "2012-R2-Datacenter"
    version   = "latest"
  storage_os_disk {
    name          = "myOSdisk"
    vhd_uri       = "${azurerm_storage_account.StorIT.primary_blob_endpoint}${azurerm_storage_          container.ContainIT.name}/myOSdisk.vhd"
    caching       = "ReadWrite"
    create_option = "FromImage"
  os_profile {
    computer_name  = "${var.ServerName}"
    admin_username = "${var.AdminUser}"
    admin_password = "${var.AdminPW}"
  tags {
    environment = “${var.EnvironmentTag}"

Azure has three separate services for distributing traffic including the Azure Load Balancer which works at layer 4, an Application Gateway which operates at layer 7 and acts as a reverse-proxy, and Traffic Manager which work at the DNS level.  This example template is deploying the Azure Load Balancer.  There is an external load balancer for the web tier and internal load balancers for the application/data tiers.  Below is a sample of code for the web tier including a public IP, the load balancer, front-end configuration, a web balancing rule, a health check probe, and a backend web server address pool.

resource "azurerm_public_ip" "WebTierLBIP" {
  name                         = "PublicIPForLB"
  location                     = "${var.region}"
  resource_group_name          = "${azurerm_resource_group."${var.resource_group_id}".name}"
  public_ip_address_allocation = "static"
resource "azurerm_lb" "WebTierLB" {
  name                = "WebTierLoadBalancer"
  location            = "${var.region}"
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  frontend_ip_configuration {
    name                 = "WebTierIP"
    public_ip_address_id = "${azurerm_public_ip.WebTierLBIP.id}"
resource "azurerm_lb_rule" "WebRule" {
  resource_group_name            = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id                = "${azurerm_lb.WebTierLB.id}"
  name                           = "WebLBRule"
  protocol                       = "Tcp"
  frontend_port                  = 80
  backend_port                   = 80
  frontend_ip_configuration_name = "WebTierIP"
resource "azurerm_lb_probe" "WebCheck" {
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id     = "${azurerm_lb.WebTierLB.id}"
  name                = "Port80responce"
  port                = 80
resource "azurerm_lb_backend_address_pool" "WebServers {
  resource_group_name = "${azurerm_resource_group."${var.resource_group_id}".name}"
  loadbalancer_id     = "${azurerm_lb.WebTierLB.id}"
  name                = "BackEndWebServers"

While not all Azure resources may be deployed with Terraform, many of the important ones are available including several not covered here.  There are definitely enough to supply your base infrastructure in Azure with or without base infrastructure for other providers as well.


The power of infrastructure-as-code really comes to life with a tool such as terraform especially with the potential of deploying infrastructure to multiple providers.  Utilizing terraform code similar to what I have shown in this post, you can quickly deploy an Azure resource group with a virtual network, route tables, network security groups, storage accounts, availability sets, virtual machines, and load balancers.  If you already have Terraform code for another provider, deploying Azure resources as shown in this post will allow you to quickly deploy a multi-cloud infrastructure.


Posted in General

Automating Cloud Passage firewall management

Halo, by Cloud Passage, is a great tool to ensure compliance across a hybrid infrastructure.
When it comes to security, some organizations have made incredible strides to streamline and
automate. But for those organizations who may not employ fully automated security, there are
often issues managing user access within services outside of LDAP (Lightweight Directory
Access Protocol). Manually configuring firewalls is tedious, not to mention inefficient. Thankfully,
CloudPassage has an API that alleviates this issue within our Python SDK.

The Python SDK allows you to easily write a script and bake Halo right into the plumbing of your
automation system. Essentially this makes it possible to do almost anything you need to do with
Halo, without having to use your browser. And as as we mentioned before, it’s such a pain to
manually configure firewall rules.

That being said, leaving things open to the world, and trusting SSH or RDP to not let the bad
guys in isn’t an option. We need firewall orchestration, but no one wants to give themselves
carpal tunnel spending day after day configuring their workloads. Time is money, right?
With Halo you have the capability to dynamically authorize specific users based on their IP to
connect to your servers through GhostPorts, and that’s a great way to enable granular security compliance and auditing.
Let me explain. Each policy under which these rules are defined and can have a GhostPorts
user added, which is a user that when registered under your account, can have access to
specific ports on workloads secured by multi-factor authentication.
I wanted to create a tool that allows me to create these firewall policies, with GhostPorts users,
using only the Halo API. This was made easier with the use of the Python SDK, which allows us
to write logic against Halo functionality without dealing with the minutia of remembering URLs or
handling authentication with Halo.

One of the main design ideas behind the Halo Python SDK is to have a single object that holds
everything the SDK needs to know in order to interact with Halo, and pass it into different parts
of the SDK. An analogy would be using a car key to unlock doors, open trunk, open glove box,
or start the car.
This script enables you to add a number of users to various groups of matching policies. Each
enabled group with these rules is looped through each specified user name and then looped
through each matching service name that is verified as an existing service and applied. This
allows multiple users that need to be added to multiple policies with multiple services to be
added quickly and securely (rules can be added as inactive by default on a per group basis or
for all groups) on a large scale, and can also be set to inactive and removed from all policies as well.

The purpose of the script is to have a number of users be added to various “groups” of matching policies. A user like “bill@example.com” can be added to every policy that has the word “qa” in it, or that exactly match “qa” by setting the wildcard setting to true or false with any number of specified services. Settings are loaded via a groups.yaml in the same path as the script, although this could fairly easily be modified to be a command line argument instead to load whatever given yaml file location is specified. Each enabled group with these rules is looped through for each specified user name and then loops through each matching service name that is verified as an existing service and applied. This allows multiple users that need to be added to multiple policies with multiple services able to be added quickly and safely (rules can be set to be added as inactive by default on a per group basis or for all groups) on a large scale, and can also be set to inactive and removed from all policies as well. Below you’ll see an example of the yaml file output that contains these values, and after I’ll give a brief overview the script style and common classes used after these parameters are passed to the script:

  enabled: group1
  usernames: foo@bar.com, bar@foo.com
  # How many to subtract from the last position

  name: Prod content hosts
  chain: INPUT
  active: False
  source: None
  destination: None
  action: ACCEPT
  services: ssh, https
  log: False
  wildcard: False

  name: foo
  chain: INPUT
  active: False
  services: cp-ssh
  action: ACCEPT
  log: false
  wildcard: True


There are a few classes to take note of, specifically within firewally_policy.py[5] and http_helper.py[6]. These classes, “FirewallPolicy,” “FirewallRule,” and “HttpHelper,” are all called through the HaloSession instantiation that grabs the required token grabbed by passing along your API key and secret, ideally within its own class function that can be called for use later on. HttpHelper is needed in order to parse the v2 users api as mentioned in the API Guide, and is not currently included within the SDK due to being subject to change.

Everything within the halo portal has an id associated with it, from users to services to policies to rules, and can be obtained through the “list_all” function. It mostly just boils down to a standard fizzbuzz problem, as in create lists of filtered rules and policies based on names, if rule url is in policy then create a dictionary list with policy rules, etc. etc.

With all this in mind, eventually you’ll have a dictionary of policy ids and required json values to pass along that looks something like this:

for pol, pos in self.pol_positions.iteritems():
    for service in self.list_of_services:
        if service["name"] == self.service:
            policy_json = {
            'firewall_rule': {'chain': self.chain, 'firewall_source': {'id': self.user_id, 'type': 'User'},
            'active': self.active, 'firewall_service': service["id"],
            'connection_states': self.states, 'action': self.action, 'log': self.log,
            'log_prefix': self.log_prefix, 'comment': self.comment, 'position': pos}}

and be ready to start updating firewall policies through python! To learn more about this process, and to use it in your own environment, visit the GitHub page. All in all, the Python SDK makes managing firewalls within Halo a breeze. While your script runs, you can feel free to go about your business managing workload issues while not being slowed down with manual configuration.

Posted in General

Battle of the PaaS. AWS vs Google Cloud Platform

In the wake of Google’s Next ’17 event and the AWS San Francisco Summit 2017, I wanted to continue the AWS vs GCP comparison series.  In this part though, I wanted to focus on each public cloud vendor’s Platform as a Service (PaaS).  In basic terms, AWS’ Elastic Beanstalk and Google’s App Engine enable development teams to deploy their application without the need to understand how to build or scale the underlying infrastructure.  The promise is great, and for most use cases, that promise delivers.  More advanced applications requiring more complex integrations and customizations may require users to leave the PaaS in favor of a different platform.  But let’s get back to Elastic Beanstalk vs. App Engine.

Application Support

The first component to compare is what platforms each PaaS supports:

Elastic Beanstalk
App Engine









.NET / C#





That was easy, they both support pretty much the same languages.  What about custom environments?

Elastic Beanstalk
App Engine
Preconfigured Docker

Single Container Docker

Custom (dockerfile)
Multi Container Docker Multiple Services
Custom (AMI)

While I think for customers using containers it would make more sense to focus on ECS vs. GKE, that comparison is coming in a future post.  The take away from the PaaS custom environments is that Elastic Beanstalk supports the use of a custom AMI created with a tool Foghorn uses often, Packer.  This provides some flexibility as customers who are comfortable using Elastic Beanstalk can branch out of the standard application support listed above (if needed).  Take care though, Elastic Beanstalk and Custom AMIs can increase build times and introduce more complications, like prepping the AMI to be used by Elastic Beanstalk.

Enough with the basics and how they are similar, let’s turn our focus to how they are different.


I am not talking about how fast your app will run, that introduces a multitude of possibilities that have nothing to do with the platform you are running on.  I am specifically focusing on how fast the PaaS can build and scale.  The first part in understanding the speed of Elastic Beanstalk is to understand what the platform is built on.  In basic terms, Elastic Beanstalk uses CloudFormation, Auto Scaling Groups, custom code to tie those components to the application stack, a user interface and CLI, and lastly a deployment agent to push your code.  The fact that Elastic Beanstalk uses these components means that it is bound to their characteristics.  Some changes that you make may require a new EC2 instance, which means your seemingly simple change will actually:

  • Update CloudFormation
  • Create new Launch Configuration
  • Update Auto Scaling Group
  • Terminate Instance(s)
  • Provision Instance(s)
  • Run Elastic Beanstalk Code
  • Deploy your application

You could easily be waiting for 5-10 minutes for this to happen. This is a critical component to understand of Elastic Beanstalk.  Some changes require the full lifecycle to trigger which can take significant time.  Ok, not as long as ordering hardware for the data center.

App Engine on the other hand does not visibly have the same dependency structure on other services.  In addition, while you have a user interface, it is really a dashboard not a full configuration interface.  Interacting with App Engine is more like using the Elastic Beanstalk CLI.  You define your application in an App Engine app.yaml file in your source code.  For example, here is the Go hello world file:

runtime: go
api_version: go1

- url: /.*  
  script: _go_app

This is about the absolute minimum to use the default settings, which is likely fine as you are getting started.  And this is also the appropriate time to mention one of the best features of App Engine.  You can terminate your instances to reduce costs, and as soon as you are ready to use the application again, App Engine will provision an instance and fire up your application on the first request.  While in Elastic Beanstalk you could create scheduled auto scaling to reduce your environment to zero instances and back up to one before your day starts, the on-demand activation of App Engine is quite nice.  Not to mention we are talking about seconds (or perhaps even less than a second) for App Engine vs. minutes (or perhaps as many as 10 minutes) for Elastic Beanstalk.


The App Engine app.yaml is akin to the Elastic Beanstalk CLI eb create command where you can define everything in a single action.

$ eb create dev-vpc --vpc.id vpc-0ce8dd99 --vpc.elbsubnets subnet-b356d7c6,subnet-02f74b0c --vpc.ec2subnets subnet-0bb7f0cd,subnet-3b6697c1 --vpc.securitygroup sg-70cff265
Creating application version archive "app-160312_014309".
Uploading test/app-160312_014309.zip to S3. This may take a while.
Upload Complete.
Environment details for: dev-vpc
  Application name: test
  Region: us-east-1
  Deployed Version: app-160312_014309
  Environment ID: e-pqkcip3mns
  Platform: 64bit Amazon Linux 2015.09 v2.0.8 running Java 8
  Tier: WebServer-Standard
  Updated: 2016-03-12 01:43:14.057000+00:00
Printing Status:

And like the App Engine app.yaml, you can also provide configuration information for your Elastic Beanstalk environment using .ebextensions:

|-- .ebextensions
|   |-- environmentvariables.config
|   `-- healthcheckurl.config
|-- .elasticbeanstalk
|   `-- config.yml
|-- index.php
`-- styles.css

What AWS has provided is a rich Console GUI to enable users to build, clone and manage (most) settings.  While the goal in both platforms is to meet the developer at their source control repository, this extensive GUI may be a welcome feature for some users.


Another cool feature that App Engine supports that is not really replicated in Elastic Beanstalk is their traffic splitting.  Whereas Elastic Beanstalk does support the idea of a Blue/Green deployment, the end result is a DNS cutover (or rollback).  Traffic splitting inside App Engine allows you to specify a percentage of traffic to go to a different version of the service.  This is closer to a canary style deployment and may be a big advantage for some applications.  This deployment style can be replicated in Elastic Beanstalk by using multiple environments and Route53 weighted DNS entries.


What about these PaaS’ in the real world?  Personally I prefer the simplicity and fast changes of App Engine.  I have found the random times to deploy changes within Elastic Beanstalk to be tedious in daily use.  That said, if your use case does not require many environment changes and little to know configuration management, Elastic Beanstalk is probably a really easy way to get your app stood up on AWS.  I also like that App Engine can basically be shut down by terminating your running instances (for development and testing use cases).  You can’t really “delete” Elastic Beanstalk unless you want to remove the entire application environment.

I think Elastic Beanstalk is a solid PaaS.  For users who want an intuitive GUI that allows for easy configuration, Elastic Beanstalk wins. That said, I personally have spent significant time to get the desired results out of it when an app requires complex configuration not available in the GUI.  There is a certain degree of mystery (though the same can be said for App Engine) that in some circumstances becomes a hindrance to progress.  I would prefer to use my own code to control the underlying resources (Launch Configuration, Auto Scaling Groups, RDS, Security Groups, etc.), where I have a clear understanding of all the code running, and perhaps more importantly, I am finding the fastest infrastructure deployment model to get changes out quickly and effectively.

Next up in the GCP vs AWS match up is something I am particularly interested in sharing.  How GCP project level boundaries can solve a challenging security paradigm in AWS.

Posted in AWS, Cloud Management, GCP, Public Cloud

AWS and GCP, Account vs Project Boundaries


One of the most interesting differences between GCP and AWS is how each vendor recommends you isolate the blast radius of functional teams.  AWS will tell you that in all likelihood, you will need at least two accounts, but possibly more.  Referring to their white paper on the subject, you can see the questions they pose to decide if you need multiple accounts:

  • Does the business require administrative isolation between workloads?
  • Does the business require limited visibility and discoverability of workloads?
  • Does the business require isolation to minimize blast radius?
  • Does the business require strong isolation of recovery and/or auditing data?

While on the GCP side of things, best practice is to use Projects to provide a functional grouping of resources.  In fact, much of the content I intended to focus on with this blog post was recently provided by Google in this excellent writeup.  I strongly recommend anyone reading this blog post to also read that writeup, which was referenced earlier this month via this blog post by Google.  So who got it right?

The truth is, except for a few specific areas, in my opinion the two systems deliver equal functionality.  Accurate cost allocation can be done in either system.  Projects and AWS Accounts are global, not regionally, bound; they can service many users in many regions with many services.  So on and so forth.  There are two key areas however where I think Google has a distinct advantage:


The biggest reason why a GCP Project is such an effective model is at the network level.  Within AWS, the VPC is isolated to a region and an account.  In order to connect other AWS VPCs within the same region, you can use peering, but that requires a well planned and scaled approach on how a fully meshed intra-region multi Account / multi-VPC network would look like and scale without any IP overlap.  And all of this assumes that the customer is ok with potentially 100’s of accounts and 100’s of networks (or make that 100s x 2 since we should isolate production in its own account).  Some customers simply do not want to manage all these accounts and networks.  This is especially true when those networks must connect back to a corporate location or data center.

GCP on the other hand does not require that each Project have its own network.  In fact, you could create one large development network within GCP (global or regional) and then create unique development Projects to isolate resources for each development team.  The fact that all these resources are in the same network does not diminish the security that the Project boundary creates.  This makes things like shared services and security auditing tools much easier to orchestrate since they can have cross Project access.  Not to mention the network does not require a complicated meshed structure.  The network could be created based on routing principals rather than development teams, simplifying the use for both the network administrators and the developers using the network.

I can’t overstate how significant the Project feature is within GCP.  In my opinion, this is a much cleaner solution to isolating access than anything AWS currently provides.  Having had numerous conversations with AWS customers explaining the difficulties providing resource based access control within the same account and network, to have such a simple solution on GCP is excellent.

User Access

The second reason I think Google got it right on Projects is how you assign user access.  You can simply create a Google Group for a GCP Project.  Add users to the Google Group (within your organization, or maybe even outside your organization like 3rd party contractors or consultants).  By contrast, there is no equivalent user grouping structure on the AWS side.  You could use an Identity account to consolidate user management and then use cross-account access with roles.  This would enable you to manage which users can access which accounts, but this is significantly more complicated than using a Google Group.

Stay tuned for how to manage a large scale deployment or projects or AWS accounts in a future blog post.



Posted in AWS, Cloud, Cloud Management, GCP, Public Cloud