mudbath
DIGITAL /

NEWCASTLE (HQ)

MELBOURNE

BRISBANE

SYDNEY

Penny pinching with AWS - ways to save money in the cloud

August 17 - Chris Close

So you’ve been looking at different cloud providers - AWS, Azure, maybe even Google Cloud. You’ve settled on AWS for various reasons but you want to reduce your costs. You might be setting up some ECS instances using a t2.micro or similar instances and think you’re getting a good deal, but there are other options out there.

For the purpose of this blog, we’re going to assume you want to run a service, or services, that can run on ephemeral servers. This means we can’t rely on any local storage on the server. Choosing this option is common and the analogy I like is of referring to servers as cattle, not pets.

We’re also going to assume some base knowledge of EC2 concepts such as Auto-Scaling Groups and Launch Configurations.

Different ways to save the $$$

Option 1 - Manually configuring Auto Scaling Groups (ASGs) to turn things on/off or up/down a schedule

This option is relatively straight-forward, especially for services which you don’t rely on outside of business hours. These could be test instances which are underutilised during the evening or a build agent farm which sees little utilisation outside of work hours.

To implement this, view your auto-scaling group (ASG) in EC2, and add some schedule rules. You can enter "cron" style rules in here, and you might find an online solution like crontab.guru handy.

This schedule scales the service up from 1 instance to 2 instances at 8am. When it comes to 6pm each day, it will scale back from 2 to 1 instance. You could scale down to 0 instances if the service is definitely not needed out of hours.

An alternative would be to create a script which will run against all your ASGs and sets the desired+maximum instance count down during "off-peak" and up again during “peak”.

Option 2 - Using Spot Instances

One of the great money saving features of AWS is Spot Instances.

What is a spot instance?

A spot instance is what is termed a "preemptable" instance in other clouds (e.g. Google Cloud). What this means is you pay a cheaper price per hour for the instance but at the risk that your instance will get switched off when the demand for that particular instance is higher.

The pricing for a spot instance is based on:

  • The instance type

  • The availability zone

You can sometimes find that the price in one availability zone differs drastically compared to another. The reason for this is simple supply and demand - there may be higher demand for an instance in the particular availability zone.

How much will I pay and how much can I save?

A quick and easy way to find this out is through the "Pricing History" button in the AWS console under the EC2 Management Console -> Instances -> Spot Instances screen:

While something like a t1.micro might be selected by default, you’ll find that other instance types will give you compatibility with the t2 instances you might already use, plus also give you heaps more RAM and CPU.

e.g. m3 medium:

These instances give you a decent amount of memory - 3.75GB, and a whole vCPU, for only 1.2c/hour. (Quick and easy site for finding different instances is here: www.ec2instances.info). This would be more than comparable to say a t2.small, but that t2 instance would only have 2GB of RAM and a single bursting vCPU, and would cost 3.2c/hour.

Those WTF moments!

Really strange things can happen, like this:

As you can see, the On-Demand price for the i3.large instance is $0.1870/hour. However, for some reason, the spot price for this instance in ap-southeast-2a is ten times that. I wonder if someone has made a mathematical error when setting their bid! This is why you should never bid above the On-Demand price, you could pay more than what the instance is normally worth!

Either way, there is plenty of capacity in ap-southeast-2b and ap-southeast-2c, and plenty of capacity in comparable spot instance types.

There has to be an easier way to find my bargain!

While some people (present company included) like to hunt through graphs, yes, there is an easier way.

You can use the Spot Bid Advisor (https://aws.amazon.com/ec2/spot/bid-advisor/) to find a suitable instance type. I recommend setting the "Bid Price" to 25% On-Demand to find the best deals. There is also the “Spot Fleet” option which I will go into - Option 3 - but I wouldn’t want to spoil the ending.

How can I get my manually created Auto-Scaling Group to use Spot Instances?

This is actually pretty damn easy. You may have created the Launch Configuration (LC) and Auto-Scaling Group (ASG) yourself, so you just go in and copy the existing LC into a new one:

First you may need to change your instance type. Hit "Edit instance type" and choose your desired new instance type if required.

And then configure the details for the launch configuration on the next screen in the wizard, or by selecting "Edit details" from the copy screen.

On this new screen select "Request Spot Instances" and set a maximum price.

Once you’ve created that launch configuration it’s a case of simply editing the Auto-Scaling Group for your service and updating it to the new Launch Configuration.

You’ll need to dispose of the existing instances using your favourite server "termination" method in order to ensure you get new instances. I recommend manually scaling up the service to double the required number of instances and ensuring they are operating correctly. Then you can scale them down again to the normal level - the instances from the old LC will drop off.

What about an Elastic Beanstalk or ECS cluster that was somehow automatically created for me by a wizard?

Your instances are likely using CloudFormation to be managed. It’s generally not a great idea to edit the LC manually for these as the changes won’t be persisted next time you edit the CloudFormation stack.

Newer ECS clusters will have support for Spot Fleets - if this is the case you would have seen these options when creating the cluster. See Option 3 for information on how a Spot Fleet works, and how to create/edit these CloudFormation template to use Spot Fleets.

How can I see that this is working?

Visit the "Spot Instances" link again in the EC2 Management Console. You’ll see your spot instances in there.

Handy tip - if you have an instance selected and hit "Pricing history" the graph will show you pricing history for the specific instance.

How can I see if I am being outbid?

Also in the "Spot Instances" link in the EC2 Management Console. If you see the “Status” of “price-too-low”, you are most likely being a bit too frugal. You’ll need to update the LC with a higher price, or change instance types/availability zones.

Option 3 - Using a Spot Fleet

(Assuming you are familiar with Spot instances already - see above)

Protecting yourself from being outbid

So we all know the jitters - you’ve found a bargain on eBay and you’re the top bidder. You get outbid by someone with a 0 feedback score. Now you will never complete your Malibu Stacey collection! If only you could have bid on that similar item at the same time, even if it did have a bit of chewing gum residue stuck in its hair.

While in the land of eBay, people have been using "sniping" tools (e.g. JBidWatcher) to automate their bids across multiple eBay items and ensuring they get the best deal - what can we do in the land of AWS instances to automate this?

Presenting the "Army of the Damned" … err I mean “Spot Fleet”

(http://futurama.wikia.com/wiki/Army_of_the_Damned)

From Amazon’s own page on Spot Fleets:

A Spot fleet is a collection, or fleet, of Spot instances. The Spot fleet attempts to launch the number of Spot instances that are required to meet the target capacity that you specified in the Spot fleet request. The Spot fleet also attempts to maintain its target capacity fleet if your Spot instances are interrupted due to a change in Spot prices or available capacity.

Sounds great doesn’t it! Well, as it turns out, it works pretty well but needs a little bit of work to get going for a pre-existing service compared to a regular ASG.

Ideally, we’d be using CloudFormation for management of our infrastructure. It’s not as daunting as it seems especially when there are pre-existing templates for things we may need.

Finding a Spot Fleet configuration to use

There is a handy tool in the EC2 Management Console under "Spot Requests" -> “Spot Advisor”.

Select your memory + vCPU requirements and you’ll get some instance types back along with an estimated total price:

You can continue to configure your fleet from here but this will give you basic EC2 instances. Ideally, these instances will be actually doing some work for us!

Creating a Spot Fleet for an ECS cluster

So we now know what instance types we should be using, now let’s use them to run some Docker stuff using ECS.

It’s great that Amazon have streamlines this process - to create an ECS Cluster with a Spot Fleet, all you need to do is:

  • Go to "Services" -> “EC2 Container Service”

  • Create a cluster by hitting "Create Cluster". Note: If you have not previously set up a cluster you may be presented with a “Get Started” button to launch a wizard instead. This wizard does not include options for a spot fleet. You should cancel the wizard and create the cluster using the "Create Cluster" button.

  • Enter values for your spot fleet. You’ll need to already have some instance sizes in mind, and you can enter between 1 and 6 instance types into the wizard:

  • And then continue to create the cluster as normal.

Your "Spot Requests" page in EC2 should include a fleet request and underneath the fleet you will see the individual instances:

Updating an existing ECS cluster’s Spot Fleet configuration

You can edit the CloudFormation template to update these values. You’ll go Services -> CloudFormation, select the template that is for your cluster, and then enter values for the following fields:

EcsInstanceType: e.g. m3.medium,i3.large comma separated values

IamSpotFleetRoleName: aws-ec2-spot-fleet-role

SpotAllocationStrategy: either lowestPrice or diversified

SpotPrice: e.g. 0.03 (for 3c/hour)

UseSpot: true <- DO NOT FORGET THIS ONE!

Why can’t I add more instance types???

There is a pretty big GOTCHA here! The template that was originally used to create the cluster determines how many instance types the fleet can handle. So if you create the cluster initially with 6 different instance types, you need to update it with 6!

Likewise, if you created the cluster recently with the wizard and did not enter spot instances, but rather On-Demand - and now want to transition to a Spot Fleet, you will only be able to denote a single instance type for the cluster.

You can find out how many instances the template is configured for from the "Template" tab in CloudFormation.

But what if you are attached to your cluster and are not keen on recreating it - you simply want to add/remove instance types? You can copy a template from a different cluster which has the right number of instance types available to you - into your existing cluster. If you don’t have an existing cluster to copy from, just make a new cluster with the right parameters. Copy the details of the "Template" tab from the new stack. And then run the Update Stack wizard for the original cluster and paste the template into the editor.

Hang on, I selected "Diversified"

… but I am only getting instances in one AZ!

This is pretty strange event but it generally means you need to update your fleet to include more instance types. While the Spot Fleet mechanism will generally try and keep things spread across AZs, it will put stuff into the same AZ if there is no other option.

Option 4 - Reserved instances

While you can save money by committing to a certain pool of instances up-front, this doesn’t give you much flexibility in the future. The savings are nothing to sneeze at, but you could potentially do better with spot instances instead.

Option 5 - Auto-scaling based on demand

If your service is responding to user demand, you might be able to set up some rules to scale the service up and down based on that demand. It can also be used in conjunction with Option 2 relatively easily - the Auto Scaling Group you use select simply has to be configured for Spot Instances and have a Spot Price set.

If you are trying to save money for your CI pipeline, e.g. Buildkite or Bamboo, you should investigate whether you can auto-scale the agents based on demand.

For Buildkite you can use their "Elastic CI for AWS": https://github.com/buildkite/elastic-ci-stack-for-aws

For Bamboo you can use "Elastic Bamboo": https://confluence.atlassian.com/bamboo/about-elastic-bamboo-289277118.html

Option 6 - Go Serverless

There is a big trend towards "serverless" architectures. Among these are AWS Lambda, Azure Functions, Google Cloud Functions.

These can all be orchestrated with a tool called… "Serverless"! While AWS Lambda is the first fully supported service, more are starting to appear now. These are listed here:

https://serverless.com/framework/docs/providers/

An in-depth look at serverless functions would be best saved for another blog post :).

So what now?

Give those AWS Spot Fleets a try if you can - it might save you quite a few $! You can also combine Spot Instances into a schedule or auto-scaling in order to maximise your savings.

As always - test this in development first!

Be sure to have a good think about whether you would use Spot Fleets in production. It might be not be appropriate based on your appetite for risk. You might prefer to have a couple of T2 instances sitting around waiting in case your spot fleet gets outbid as well. (Especially for critical applications!)

Consider too, pushing off some stuff into some serverless functions (e.g. AWS Lambda) - let someone else deal with the scaling, turning off, turning on, etc.

What about Azure? Is there a Spot Fleet for Azure?

Currently the answer for that is… "yes-ish". Microsoft have announced “Low-priority VMs” that can be run as part of Azure Batch and it is currently in “preview”. https://azure.microsoft.com/en-gb/blog/announcing-public-preview-of-azure-batch-low-priority-vms/

As far as having it generally available for non-batch workloads, we’ll have to wait and see.