EBS Best Practices and Performance Tuning

One of my ex-colleagues used to be a race car driver. He probably still is, and what he told me once has stuck in my mind ever since. "Do you know where the sport car starts? Oh no, not the engine. It's the tires, man. You see, tires are the only thing that hold you to the ground. And that's where it all starts and then everything else follows."

Similarly, EBS is where it all starts on AWS. All modern AMIs have their root volumes backed by EBS, meaning an EC2 instance's root device is an Amazon EBS volume created from an EBS snapshot. In addition to that, any time your EC2 instances require a persistent, disk-like storage - EBS is the answer.

But before we continue, let's clarify some basic terms first.

What is EBS?

EBS stands for Elastic Block Store, a managed AWS service providing persistent block storage volumes of variable size (up to 16TB). Over the years, the underlying storage device has moved from magnetic to SSD, but EBS is not about any specific device type, it's about providing EC2 instances with highly available and durable storage volumes. An important aspect of EBS volumes is the ability to take point-in-time incremental snapshots, which are stored reliably on AWS S3. And both EBS volumes and snapshots can be transparently encrypted with no visible performance impact.

What is a block storage?

Unlike file storage we use daily, block storage is a "bare-metal" storage - an array of unformatted and evenly-sized data blocks providing best performance without the overhead of a file system. However, normally EBS volumes are formatted with a file system and mounted at specific location such as "/data" so they behave as if they were a regular drive.

What is an EBS-backed EC2 instance?

When you launch an EC2 instance, its root volume contains an OS image used to boot the instance. Where does this image come from?

A quick history lesson: before EBS was introduced, an OS image was copied from S3 to the instance store - disks physically attached to host. Such EC2 instances are called "Instance Store-backed Instances", however confusing that may sound. It's important to remember that instance store is a non-persistent, temporary storage, also referred to as ephemeral. Correspondingly, instance store-backed instances can only be restarted or terminated but they can't be stopped.

Nowadays, most EC2 instances use EBS volumes for the root device, created from an AMI pointing to an EBS snapshot. These instances are called "Amazon EBS-backed Instances". Not only do they launch faster, but since EBS volumes are persistent, instances can be stopped and started without loosing the data - on boot same EBS volume is re-attached.

EBS Root Device Type

To continue our exploration of EBS volumes and performance tuning, let's turn to the "Amazon EBS Deep Dive" session from AWS re:Invent 2014 by Dougal Ballantyne.

What do we learn from this session?

  • There are three types of EBS volumes available:

    • "gp2" or General Purpose (SSD). Default volume type for most scenarios, like boot volumes, file servers, dev and test environments, small to medium DB workloads. Maximum throughput - 128MB/s, maximum performance - up to 3000 IOPS (I/O operations per second).
    • "io1" or Provisioned IOPS (SSD). Provide the highest performance for workloads such as large DBs requiring consistent performance up to 4000 IOPS. Maximum throughput - 128MB/s.
    • "standard" or Magnetic. Cheap and less performant option. Can be used for cold storage with infrequent data access. Maximum throughput - 40-90 MB/s, average performance 100 IOPS, burstable to several hundred IOPS.
  • Both SSD-backed types count their I/O operations in 256KB blocks and provide a single digit read/write latency (time elapsed between I/O submission and it's completion, measured in milliseconds) with five nines of availability. 99.999% of availability equals to 315 seconds of downtime each year (0.001% of 31536000 seconds). In reality, the EBS team has observed 99.9996% availability making SSD-backed EBS volumes unavailable only 120-150 seconds a year.

  • General Purpose (SSD) type provides a baseline performance of 3 IOPS per GB with 99% performance consistency. So a 100GB volume comes with 300 IOPS guaranteed baseline performance 99% of the time. Maximum baseline performance is 3000 IOPS for 1TB volumes.

This type comes with burstable I/O bucket (image borrowed from the original blog post of Jeff Barr), starting with 5.4 million IOPS and accumulating an additional 3 IOPS per GB every second.

Bucket bursting is spendable at 3000 IOPS/sec rate, meaning even the smallest volume possible with full bucket will be consistently performing at 3000 IOPS for 30 minutes until no credits are left and it goes back to a baseline performance of 3 IOPS per GB. For 500GB volume it takes 60 minutes of sustained performance to completely drain the bucket. 1TB volumes never drain their buckets and in reality only 0.3% of all "gp2" volumes have exhausted their I/O buckets. That's why this volume type is now the default for all boot volumes - they burst nicely during OS boot and services init, drastically reducing EC2 instances "get ready" wait time.

  • Provisioned IOPS (SSD) type represents the high end of consistent performance up to 4000 IOPS, making it the ideal type for critical DB workloads don't tolerating any I/O stalls. Overall performance consistency is 99.9%.

EBS Cheat Sheet

EBS Performance Tuning

Whenever you find yourself tuning the EBS performance always keep the overal architecture in mind:

EBS Architecture

Here are some of the best practices for tuning the performance of your EBS volumes.

  • Match EC2 instance type and it's networking, RAM and CPU resources with EBS volume type and size. Disparity similar to attaching large EBS volumes to "t2.small" instances is not good.

  • The network link between EC2 instance and its EBS volumes can be a limiting factor. Use EC2 instances with "EBS Optimized" flag to create a separate dedicated network channel between EC2 and its EBS volumes with consistent level of performance and throughput up to 2000Mbps. Not using EBS Optimized channel makes your EC2 instances share their EBS I/O traffic with a regular one, making them contend over the precious bandwidth, thus considerably reducing the EBS throughput on a noisy networks.

  • EBS volume itself may not have enough capacity to accept all incoming I/O requests. If this is the case - use Provisioned IOPS EBS volumes for guaranteed and consistent performance.

  • SSD-backed EBS volumes count incoming requests in chunks of 256KB so, if possible, fine-tune your applications to adjust its block I/O size and watch out for unnecessarily large readaheads sending extra requests to your volumes.

  • Know your limits - even with best possible EBS Optimized EC2 type and EBS volume with Provisioned IOPS your application's block I/O size makes it either I/O-bound or throughput-bound. Sending too many small (well below 256KB) I/O requests can hit the limit of 4000 IOPS but sending too many heavy (close to 256KB) I/O requests will hit the limit of maximum throughput of 128MB/s (depending on the instance type). Sending 4000 32KB or 1000 128KB requests per second respects both limits at once.

So smaller I/O applications tend to be I/O-bound, larger I/O applications tend to be throughput-bound.

  • Use EXT4 or XFS file system. EXT4 should be your default choice for standard environments but in environments with high amount of parallel requests and multi-GB files you may prefer XFS, sometimes referred to as The Enterprise File System.

  • Utilize EBS CloudWatch metrics especially "VolumeQueueLength". This metric shows an amount of "in flight" I/O operations between an EC2 instance and its volume at any given point in time. There is an interdependency between IOPS, queue depth and I/O latency resulting in escalating latency as queue depth grows larger. An example graph like the one below plots queue depth with latency and IOPS, visualizing its negative impact on both.

EBS Queue Depth

Optimal queue depth to achieve lower latency and highest IOPS is typically between 4-8, about 1 queue depth item per 500 IOPS. However, EBS-optimized EC2 instances enjoy consistent latency experience.

AWS re:Invent 2014 Announcements

Introduced at AWS re:Invent were larger and faster EBS volumes:

  • Maximum volume size - 16Tb (up from 1Tb)
  • General Purpose (SSD) - maximum 10000 IOPS (up from 3000), maximum throughput - 160Mb/s (up from 128MB/s)
  • Provisioned IOPS (SSD) - maximum 20000 IOPS (up from 4000), maximum throughput - 320Mb/s (up from 128MB/s)

EBS Queue Depth

This is a tremendous upgrade since SSD-backed EBS volumes were just introduced half a year ago, in June 2014. In fact, if you watch an older session "Maximizing EC2 and Elastic Block Store Disk Performance" from AWS re:Invent 2013 or AWS Summit you'll notice recent SSD-related EBS advances make outdated some of yesterday's optimization techniques. All the more reasons to follow EBS announcements and documentation for staying up-to-date.

Additional best practices

  • Take snapshots of your EBS volumes. Snapshots are incremental, trivial to use and allow to backup and restore your data so easily. As a rule of thumb, new snapshot should be taken for every 20 GB of new data accumulated on a volume. And once taken EBS snapshots can be restored on any type of EBS volume, cloned into any AZ in the same region and efficiently copied across AWS regions.

  • Encrypt your EBS volumes. Encrypting new volumes is a trivial operation which is completely transparent to your applications. An upcoming integration of AWS Key Management Service with EBS Encryption makes this option even more appealing.

See also AWS IAM Best Practices and AWS EC2 Performance Tuning.

Learn more about Mobot today!

Copyright © 2017 MinOps, Inc.