New APIs expand the use for AWS Tags

Amazon recently announced some new features around tagging permissions that make tags considerably more useful.  Although this just came out, I already see a few areas where we can simplify automation scripts. More importantly, since we can limit access to tags by key, this allows us to reserve certain keys for central functions like cost allocation and monitoring, while allowing individual teams to still leverage tags for other purposes without the risk of production required tags being modified.

Resource level permissions are still not to the point where complete isolation of resources for different teams can be implemented in a single account, but this is a huge step forward in enabling developers the access they need without the risk of breaking production automation.

How will you use these new features? Reply below!

Posted in AWS, Cloud, Cloud Management

HIPAA on AWS the Easy Way

Many of our customers are running workloads that are subject to HIPAA regulations.  Running these on AWS is definitely doable, but there are some catches.  Foghorn has made it super easy for our customers to run HIPAA compliant workloads on AWS. Here’s how..

What is a BAA?

If you are not familiar with HIPAA, the regulations require a Business Associate Agreement to be executed with each of your partners who may have access to Protected Health Information.  From the Health Information Privacy page on BAA:

‘A “business associate” is a person or entity, other than a member of the workforce of a covered entity, who performs functions or activities on behalf of, or provides certain services to, a covered entity that involve access by the business associate to protected health information.  A “business associate” also is a subcontractor that creates, receives, maintains, or transmits protected health information on behalf of another business associate.  The HIPAA Rules generally require that covered entities and business associates enter into contracts with their business associates to ensure that the business associates will appropriately safeguard protected health information. ‘

AWS HIPAA Rules and Regs

If you are handling PHI today, you already know that any vendor that you share PHI with is required to sign a BAA.  Amazon has made this process pretty straightforward, in that they offer a BAA that they will happily sign for all customers storing and processing PHI on AWS.  But the devil is in the details.  You can read more at the AWS HIPAA compliance page here.  The important quote:

“Customers may use any AWS service in an account designated as a HIPAA account, but they should only process, store and transmit PHI in the HIPAA-eligible services defined in the BAA.”

So are you protected by the BAA?

The BAA that Amazon signs covers only a few of the AWS services, and requires that you use those services in specific architectural configurations.  If you break those conventions, the BAA is nullified.  Worse, it is nullified for your entire account, not just for data handled by the non-compliant components.

An easy example would be if your team had a compliant architecture for production, but a non-compliant infrastructure for staging.  This may have been your configuration to save on costs, and in order to maintain compliance you scrub staging data of PHI before uploading.  Let’s say that an engineer mistakenly uploaded non-scrubbed data to the non-compliant environment.  You just invalidated your BAA, even for your production environment!

In addition, any of the technical consultants, subcontractors, and managed services companies that you use also need to sign a BAA. This process can be time consuming and costly from a legal perspective.

The Easy Way

Foghorn is both a cloud services and a cloud engineering provider.  Because you get all of your AWS as well as your engineering and managed services from us, you can sign a single BAA with Foghorn.  All of the AWS gotchas still apply, but Foghorn is deeply experienced in architecting and managing HIPAA compliant environments.  By partnering with Foghorn, we can make sure your PHI is safe, and your company is protected from accidentally invalidating your AWS BAA.  There are a few ways we accomplish this:

  1. All Foghorn employees undergo HIPAA training.  We make sure our employees understand the what, the how and the why of HIPAA to avoid any simple errors.
  2. All Foghorn customer HIPAA accounts are tagged.  We know which accounts are HIPAA, and which aren’t, without a doubt.  That makes tracking and auditing easier.
  3. We segregate your PHI workloads from non PHI workloads when possible, to make sure we can focus the restrictive HIPAA based policies only where required. This saves cost and maintains agility on the rest of your workloads.
  4. We design the HIPAA infrastructure with belt and suspenders.  We make sure your architecture is compliant with Amazon’s BAA conditions, and add multiple layers of assurance.
  5. We advise and guide on the responsibilities that AWS does not take care of.  This includes scanning, penetration testing, change processes, incident response, etc.
  6. We set up realtime audit monitoring for key controls to make sure that in case someone changes something in your account that may lead to compliance issues, your team is notified immediately.

Call us today for more info on how we can help you meet HIPAA compliance while retaining your agility.

Posted in AWS, Cloud, Health Care, Public Cloud

Take our DevOps Code… Please!

At Foghorn, we’ve been writing modular DevOps code for many years.  Over time, we have developed a set of modules that represent about 80% of our customers needs, and leveraged those modules to accelerate the timeline of DevOps projects.  This includes infrastructure as code (network, servers, IAM rules, security rules, etc.), deployment code, and operations code.  Generally we charge a fixed fee for projects which leverage FogOps IP in place of our standard hourly rates. This has been a great model to help get new customers and new initiatives up and running quickly at a relatively low cost.  Once delivered, our customers either ask us to evolve and iterate on their code, or they take complete ownership.

Recently we’ve spent a considerable amount of time to modularize our code and offer it to our customers directly.  During this process, we felt that defining a pricing model was holding us back from releasing more value more quickly to our customers.  We wanted the ability to churn out modules fast, without coming up with pricing. We also wanted the ability to maintain and update our modules.  We came to the conclusion that the best way we could serve our customers is to give this code away, for free, to the community of customers who have selected Foghorn as their Public Cloud and DevOps provider.  We are excited to help our customers grow quickly, and hope to see great uptake in the use of our code.

We discussed open-sourcing the modules completely, but ran into a few snags.  If our goal was mass adoption of our project, we would simply open source it.  But our goal is not to create another public repository of community recipes.  The value that our modules bring are that they represent Foghorn’s prescriptive opinion on best practice design and implementation.  The contributors for this project will be limited to Foghorn engineers and our clients’ engineers who share the same philosophies around building and running infrastructure and applications as code.

Our clients who choose to use our modules can benefit in a few ways:

  1. Our clients chose Foghorn because they appreciate and align with Foghorn’s principals for running mission critical workloads.  Leveraging our modules is an easy way to adopt those principles.
  2. Even if our clients know all of our principals by heart, and have the staff capabilities to implement as code, they can save a great deal of time and energy by using our modules as starting points for their custom DevOps code.
  3. Many of our customers simply don’t have the bandwidth to do this, and so they pay us to do it for them.  By leveraging our modules, we finish faster, and they pay less.
  4. Faster is not only cheaper. It’s faster too!  Our customers can accelerate their mission critical initiatives, making themselves more competitive, and positively impacting the top line of their business.
  5. We hate getting waken up at night.  We always err on the side of stability and reliability with our designs. Our clients’ engineering teams usually appreciate this.

So how do you sign up?  Simple.  Give us a ring.


Posted in Cloud, Cloud Management, Public Cloud

DevOps Muscle for the Crossfit Games


As we approach the Crossfit games once again, I thought I’d share some detail on how Foghorn helped Crossfit update their DevOps processes for the 2016 games.  Crossfit has been growing rapidly, and had challenges in 2015 scaling their leaderboards.  We worked together to prep for the 2016 games, and had some great success.  As usual, we leveraged HashiCorp tools to get the job done.. If you are interested reading more, we’ve dropped a case study on our main site.

Posted in AWS, Cloud, Cloud Management, Public Cloud

Business Risk? Or Assurance

risk_and_rewardWith the announcement that Snapchat has gone ‘all in’ on Google Cloud Platform, Snap has incorporated this plan into their financial filings as an additional ‘business risk’.

“Any disruption of or interference with our use of the Google Cloud operation would negatively affect our operations and seriously harm our business…”

My immediate reaction was really to question whether this is a business risk, or a business assurance? Certainly Snap is now dependent on Google’s ability to scale and manage a massive infrastructure, and so the disclosure is appropriate. But as a prospective investor, I’d feel that a great potential risk to the business, loss of availability of the Snapchat service, has been greatly reduced.

If there was a book maker taking odds on the likelihood of various companies making technical and/or operational missteps that cause an outage, Google would not be the company I’d bet on.  Quite the opposite, I think they’ve proven over the last 15 years or so that they’re pretty good at running large infrastructure.

As this reality begins to sink in with investors, partners, customers, and business leaders, cloud adoption will accelerate beyond current predictions.


Posted in GCP, Public Cloud

Terraform beats CloudFormation to the Punch with Inspector Support


Cloud Neutral DevOps

HashiCorp makes some of our favorite DevOps tools.  Along with being feature rich, stable, and well designed, they are cloud neutral. This allows DevOps teams to become experts with a single tool without having to get locked in to a single cloud vendor.  Some cloud neutral tools try to completely abstract the cloud provider and the services available.  This forces the user to only use the ‘lowest common denominator’ of services available from all supported providers.  With Terraform, HashiCorp has not fallen into this trap.  They embrace the rich set of services available from each provider, with different services supported for different clouds. This allows us to put the right workload in the right cloud, without the need to leverage multiple tools, or build multiple deployment pipelines.

Terraform Supports AWS Inspector

It would be expected, however, that Terraform would trail the cloud providers’ proprietary tools supporting new cloud products and features.  But HashiCorp is amazingly quick to support features.  A great example is v 0.8.5, which now supports AWS’ Inspector service.  As of the publishing of this post, AWS’ own CloudFormation tool still does not have support for Inspector.  Pretty amazing for a small company offering an open source product!

Tagged with:
Posted in AWS, Cloud, Cloud Management, Public Cloud

Who’s Managing your Cloud?

After designing, building and managing hundreds of environments, sometimes we get a little too deep in the weeds with our blogs. So I thought I’d share this article, which gives a high level perspective on how companies benefit from working with a cloud managed services provider.

The article covers performance, scalability, security and compliance.  I’d add some additional benefits, like:

Cost Optimization:  Your provider knows where additional cloud spend will help, vs flushing money for little benefit.

Agility: Sure, the cloud enables agility, but it doesn’t guarantee it.  Your provider should be full of DevOps ninjas, who can put in place the pieces that don’t come ‘out of the box’ with IaaS.

Manageability: It’s so easy to string together IaaS components that give you the functionality you need. It’s also easy to do so in a manner which creates a management nightmare.  Especially if you’ve never done it before.  Your provider should lead you down the path to an infrastructure that can scale easily without additional management overhead.

And the winner is.. 

There are lots of great choices out there, although I’m pretty biased for FogOps, where our motto is “Live by Code”.  The meaning?  Everything we do to manage your site is done with code, leaving you with a self healing, auto-scaling environment that leverages continuous deployment to make your life easy.

Oh yeah, and it works on AWS, Azure, and Google Cloud.

Tagged with: , ,
Posted in AWS, Azure, Cloud, Cloud Management, GCP, Public Cloud

Disney goes Hybrid; Shares Challenges


Ian Murphy recently wrote a great article on Disney’s journey to the Hybrid Cloud. The lightning talk, given by Blake White,  highlighted the issues that many enterprise companies face when adopting some of the latest technologies, like Kubernetes and AWS, and integrating them with their existing on-prem infrastructure.  Although these technologies are well suited for integration, often the heavy lifting has to happen by the enterprise.  Many open source projects are very robust, but their focus is not on enabling integration with existing infrastructure.

The perfect example given in the talk can be found when Blake explains that in order to get the integration that Disney required, they had to build their own bespoke Kubernetes cluster provisioning tool.

Despite these challenges, Disney is forging ahead – a good sign that the value they are receiving makes overcoming the challenges a worthy endeavor. Lesson to learn?  Things worth doing are hard. Don’t let that stop you!


Posted in Cloud, Cloud Management, General, Private Cloud, Public Cloud

Crunching HIPAA data just got cheaper

With the recent AWS announcement, AWS customers can now leverage spot instances to crunch their HIPAA big data workloads.   This can help decrease the compute costs of these jobs by up to 90%, making EC2 a cost effective option for crunching large amounts of data that include Protected Health Information.

Amazon’s BAA with its HIPAA compliant customers requires that all EC2 instances that process PHI must run in dedicated tenancy mode.  Until now, spot instances were not available in dedicated tenancy mode, leaving this cost effective option unavailable for processing PHI.

Spot instance pricing is Amazon’s method of selling excess capacity that can be pre-empted if needed.  Spot pricing is market based, and often falls well below even the steepest discounts afforded with long term commitments.  Since the nodes can be pre-empted, spot instances are not suitable for many types of workloads, but most cluster compute technology is designed to tolerate node losses, making spot instances a great way to save money on short lived tasks that require high compute power.

I took a quick peek in the AWS interface, and didn’t see any option to leverage dedicated spot instances in AWS’ managed Hadoop framework, EMR.  Hopefully we will see that soon!

Tagged with:
Posted in Amazon Web Services, AWS, Cloud, Health Care, Public Cloud

Pay AWS Less for your Dev and Test Workloads

24×7 environments are handy, but are they required for Dev and Test?

I’m going to assume your development team is not leveraging development environments 24 hours a day 7 days a week.  That is to say, I’m assuming you don’t have 3+ teams on shift throughout the world.  I’m also going to assume you aren’t building and destroying development environments as part of your continuous deployment pipeline (more on that later).  Lastly, I’m going to assume that development does not need to mirror production (that’s what Test is for).  First off, AWS provides numerous means by which you can tune your setup with cost savings in mind.  These are focused around some dev and test behavior that may have gone overlooked.  So with all that in mind, let’s cut costs.

Scheduled development servers

First off, let’s simply make our development server layer mirror our actual development schedule(s).  I am talking about the stateless tier, not the database.  Let’s create time-based scaling triggers (natively supported in Auto Scaling Groups, Elastic Beanstalk, OpsWorks, or just use Lambda!).  Scale-In to 0 instances in service 1 hour after development teams stop working.  Scale-Out to 1 instance in service 1 hour before development teams start working.  Even a hardcore development team working from 6am to midnight six days a week still leaves money on the table.  Let’s see how much exactly, assuming 5 development teams, each using a single c4.large development environment:

Scheduled Dev Environment Cost

Parking development servers

Let’s say we have already implemented scheduled development servers as outlined above, or maybe we haven’t yet but we are using Auto Scaling Groups.  In either case, we can simply park an unused development environment if we know no one is actively contributing code changes.  Let’s take the following scenario.  We have a major push happening on 2 of our products.  Development on the other 3 has been halted so resources can be borrowed.  We will be suspending any work for two consecutive sprints (let’s say we use two week sprints).  That’s 3 development environments with no changes for a month.  We can simply set our desired capacity in the auto scaling group to 0, or if we setup time-based scheduling, simply remove the Scale-Out rule.

Resource utilization, or lack thereof

It’s easy to get complacent with an instance type.  The development team has grown to love the fast provisioning and rock solid compute & network reliability of the c4.  Furthermore, since you use c4.2xlarge instances in production it’s a logical development downgrade using the c4.large.  But what if your development environment didn’t really require the processor performance and network stability? What if instead you simply needed any 2 core ~ 4GB memory server?  A t2.medium might well do the job..  It lacks the sustained compute capability, but that is rarely needed in dev.  Assuming 5 development environments, you can save even more:


Resource configuration, being thoughtful with your decision

Let’s say your production database has strict IOPS requirements.  This database was created before AWS released General Purpose SSD (GP2) EBS storage.  As a result, you configured Provisioned IOPS to meet  your requirements.  Since test needed to match production, the same Provisioned IOPS were brought over to the database.  Furthermore, in an effort to mirror test with production, they also made test Multi-AZ RDS.  These decisions were innocent at the time, but have a definite impact on hourly costs.  Since our databases are running 24×7, there is an immediate opportunity to provide the same performance level at a reduced availability and cost.  Here is how that would play out, assuming we reduced our test DB to a single availability zone with gp2 based storage:



It is critical to test against a duplicate of production before deploying to production, but let’s first assume you aren’t doing blue/green deployments (more on that later).  Test only needs to mirror production during testing.  And just like scheduled development servers, test environments should be running during automated and QA test schedules.  Going even one step further, the mirroring of production in test need only occur during performance tests (or similar tests where resources impact meaningful results).  Non performance related testing can occur on a production-like environment, similar in layers but resourced like development and run on a schedule matching QA and automation testing.  Then when true performance testing occurs, the test environment can be modified to production-like resources.

Test should mirror production, sometimes

You have a test environment in one of two states. State 1 is the QA and Automation validation and regression testing setup. State 2 is the production mirror performance testing setup. Let’s compare 24×7 vs Scheduled and Non Performance Testing vs Performance Testing. Production uses Elastic Load Balancing, 15 c4.2xlarge application servers, 3 node m4.xlarge Redis cluster and an r3.xlarge MySQL database. For ease of scheduled states, the database will be production specification running 24×7, but we will not be using the AWS RDS Multi-AZ feature like we do in production. This environment is about $6000 / month for a truly cloned production setup. But we don’t need that level of scale unless we are performance testing.  By scaling in and applying our 8am-6pm scheduling, we can reduce the costs dramatically.  When it’s time to do performance testing, we scale up only for the tests, lets say twice a week for 4 hours. Sparing you the math, We end up at around $1,040 / month.  As you can see, combining all of these techniques can save a fortune.

Do I even need a dedicated test environment?

Back to blue/green deployments.  This is not a blog post about the what or why or how of blue/green deployments.  But let’s say your environment supports that kind of deployment.  Why run a test environment that is a mirror of production 24 hours a day (or even on a schedule) when you can simply build the environment, run through the testing, perform the auto scaling group swap (for example), wait a reasonable amount of time to support rollback, and finally terminate the previous production environment.  In this case, a test environment that was running 24×7, or even 96 hours a week can be reduced to the time it takes to build, test and support rollback.  If this is automated (more on that later), your scheduled test environment running 60 hours a week could potentially be reduced to a production clone running for more like 8 hours a week.  In this world, let’s assume our workflow is very simple, measured in days.  Day 1, we build and test.  Day 2 we cutover blue/green and leave both running.  Day 3, we leave both running for one more day of rollback.  Our 60 hour scheduled test environment is now 24 hours per week with the added benefit that we are testing against production specifications while at the same time saving money.

Why automate the build as part of testing?

Whether you are doing blue/green deployments or not, there is a justification in building the test environment from nothing each time you do it.  While the only cost savings measure in doing so is reduced run time (you build, test, and destroy, or build, test, and cutover), the benefits go way beyond that.  This workflow validates far more than an updated application deployment.  You are also testing your configuration management, your infrastructure code and potentially the same workflow you would use for disaster recovery.  Building a server using configuration management to create a test environment and never doing so again is stopping short of the power of configuration management.  This “permanent” environment you created introduces false assurance that you can recreate this setup any time.  Since application code is being pushed to servers with application code already on them, unknown dependencies get introduced into the application.  That dependency structure doesn’t stop at the application either. By not building from your servers from nothing, even the configuration management code may be presuming a given state.  Automating test environments from scratch leads to automating build/deploy, which leads to well exercised infrastructure code.

Tagged with: , ,
Posted in Amazon Web Services, AWS, Cloud, Cost Optimization, Public Cloud