AWS Compute Blog

Distributed Deep Learning Made Easy

by Daniele Stroppa | on 10 NOV 2016 | in Amazon EC2 | Permalink | Comments

This is a guest post from my colleagues Naveen Swamy and Joseph Spisak.

———————————

Machine learning is a field of computer science that enables computers to learn without being explicitly programmed. It focuses on algorithms that can learn from and make predictions on data.

Most recently, one branch of machine learning, called deep learning, has been deployed successfully in production with higher accuracy than traditional techniques, enabling capabilities such as speech recognition, image recognition, and video analytics. This higher accuracy comes, however, at the cost of significantly higher compute requirements for training these deep models.

One of the major reasons for this rebirth and rapid progress is the availability and democratization of cloud-scale computing. Training state-of-the-art deep neural networks can be time-consuming, with larger networks like ResidualNet taking several days to weeks to train, even on the latest GPU hardware. Because of this, a scale-out approach is required.

Accelerating training time has multiple benefits, including:

Enabling faster iterative research, allowing scientists to push the state of the art faster in domains such as computer vision or speech recognition.
Reducing the time-to-market for intelligent applications, allowing AI applications that consume trained, deep learning models to access newer models faster.
Absorbing new data faster, helping to keep deep learning models current.

AWS CloudFormation, which creates and configures Amazon Web Services resources with a template, simplifies the process of setting up a distributed deep learning cluster. The CloudFormation Deep Learning template uses the Amazon Deep Learning AMI (supporting MXNet, TensorFlow, Caffe, Theano, Torch, and CNTK frameworks) to launch a cluster of Amazon EC2 instances and other AWS resources needed to perform distributed deep learning. CloudFormation creates all resources in the customer account.

EC2 Cluster Architecture

Resources created by the Deep Learning template

The Deep Learning template creates a stack that contains the following resources:

A VPC in the customer account.
The requested number of worker instances in an Auto Scaling group within the VPC. These worker instances are launched in a private subnet.
A master instance in a separate Auto Scaling group that acts as a proxy to enable connectivity to the cluster via SSH. CloudFormation places this instance within the VPC and connects it to both the public and private subnets. This instance has both public IP addresses and DNS.
A security group that allows external SSH access to the master instance.
Two security groups that open ports on the private subnet for communication between the master and workers.
An IAM role that allows users to access and query Auto Scaling groups and the private IP addresses of the EC2 instances.
A NAT gateway used by the instances within the VPC to talk to the outside world.

The startup script enables SSH forwarding on all hosts. Enabling SSH is essential because frameworks such as MXNet makes use of SSH for communication between master and worker instances during distributed training. The startup script queries the private IP addresses of all the hosts in the stack, appends the IP address and worker alias to /etc/hosts, and writes the list of worker aliases to /opt/deeplearning/workers.

The startup script sets up the following environment variables:

$DEEPLEARNING_WORKERS_PATH: The file path that contains the list of workers
$DEEPLEARNING_WORKERS_COUNT: The total number of workers
$DEEPLEARNING_WORKER_GPU_COUNT: The number of GPUs on the instance

Launch a CloudFormation Stack

Note: To scale to the desired number of instances beyond the default limit, file a support request.

Download the Deep Learning template from the MXNet GitHub repo.
Open the CloudFormation console, and then choose Create New Stack.
Choose Choose File to upload the template, and then choose Next:
For Stack name, enter a descriptive stack name.
Choose a GPU InstanceType, such as a P2.16xlarge.
For KeyName, choose an EC2 key pair.
For SSHLocation, choose a valid CIDR IP address range to allow SSH access to the master instance and stack.
For Worker Count, type a value. The stack provisions the worker count + 1, with the additional instance acting as the master. The master also participates in the training/evaluation. Choose Next.
(Optional) Under Tags, type values for Key and Value. This allows you to assign metadata to your resources.
(Optional) Under Permissions, you can choose the IAM role that CloudFormation uses to create the stack. Choose Next.
Under Capabilities, select the checkbox to agree to allow CloudFormation to create an IAM role. An IAM role is required for correctly setting up a stack.
To create the CloudFormation stack, choose Create
To see the status of your stack, choose Events. If stack creation fails, for example, because of an access issue or an unsupported number of workers, troubleshoot the issue. For information about troubleshooting the creation of stacks, see Troubleshooting AWS CloudFormation. The event log records the reason for failure.

Log in to the master instance.

SSH agent forwarding securely connects the instances within the VPC that is connected to the private subnet. The idea is based on Securely Connect to Linux Instances Running in a Private Amazon VPC.

Find the public DNS/IP of the master.

The CloudFormation stack output contains the Auto Scaling group in which the master instance is launched. Note the Auto Scaling group ID for MasterAutoScalingGroup.

Open the Amazon EC2 console.
In the navigation pane, under Auto Scaling, choose Auto Scaling Groups.
On the Auto Scaling page, search for the group ID and select it.
On the Instances tab, find the instance ID of the master instance.
Choose the instance to find the public DNS/IP address used for login.

Enable SSH agent forwarding.

This enables communication with all instances in the private subnet. Using the DNS/IP address from Step 1, modify the SSH configuration to include these lines:


Host IP/DNS-from-above  
ForwardAgent yes

Run MXNet distributed training.

The following example shows how to run MNIST with data parallelism. Note the use of the DEEPLEARNING_* environment variables:


#terminate all running Python processes across workers 
while read -u 10 host; do ssh $host "pkill -f python" ; done 10<$DEEPLEARNING_WORKERS_PATH  

#navigate to the mnist image-classification example directory  
cd ~/src/mxnet/example/image-classification  

#run the MNIST distributed training example  
../../tools/launch.py -n $DEEPLEARNING_WORKERS_COUNT -H $DEEPLEARNING_WORKERS_PATH python train_mnist.py --gpus $(seq -s , 0 1 $(($DEEPLEARNING_WORKER_GPU_COUNT - 1))) --network lenet --kv-store dist_sync

These steps are only a subset. For more information about running distributed training, see Run MXNet on Multiple Devices.

FAQ

1. How do I change the IP addresses that are allowed to SSH to the master instance?

The CloudFormation stack output contains the security group that controls the inbound IP addresses for SSH access to the master instance. Use this security group to change your inbound IP addresses.

2. When an instance is replaced, are the IP addresses of the instances updated?

No. You must update IP addresses manually.

3. Does the master instance participate in training/validation?

Yes. Because most deep learning tasks involve GPUs, the master instance acts both as a proxy and as a distributed training/validation instance.

4. Why are the instances in an Auto Scaling group?

Auto Scaling group maintains the number of desired instances by launching a new instance if an existing instance fails. There are two Auto Scaling groups: one for the master and one for the workers in the private subnet. Because only the master instance has a public endpoint to access the hosts in the stack, if the master instance becomes unavailable, you can terminate it and the associated Auto Scaling group automatically launches a new master instance with a new public endpoint.

5. When a new worker instance is added or an existing instance replaced, does CloudFormation update the IP addresses on the master instance?

No, this template does not have the capability to automatically update the IP address of the replacement instance.

Build Serverless Applications in AWS Mobile Hub with New Cloud Logic and User Sign-in Features

by Vyom Nagrani | on 09 NOV 2016 | in Amazon API Gateway, AWS Lambda | Permalink | Comments

Last month, we showed you how to power a mobile back end using a serverless stack, with your business logic in AWS Lambda and the resulting cloud APIs exposed to your app through Amazon API Gateway. This pattern enables you to create and test mobile cloud APIs backed by business logic functions you develop, all without managing servers or paying for unused capacity. Further, you can share your business logic across your iOS and Android apps.

Today, AWS Mobile Hub is announcing a new Cloud Logic feature that makes it much easier for mobile app developers to implement this pattern, integrate their mobile apps with the resulting cloud APIs, and connect the business logic functions to a range of AWS services or on-premises enterprise resources. The feature automatically applies access control to the cloud APIs in API Gateway, making it easy to limit access to app users who have authenticated with any of the user sign-in options in Mobile Hub, including two new options that are also launching today:

Fully managed email- and password-based app sign-in
SAML-based app sign-in

In this post, we show how you can build a secure mobile back end in just a few minutes using a serverless stack.

Get started with AWS Mobile Hub

We launched Mobile Hub last year to simplify the process of building, testing, and monitoring mobile applications that use one or more AWS services. Use the integrated Mobile Hub console to choose the features you want to include in your app.

With Mobile Hub, you don’t have to be an AWS expert to begin using its powerful back-end features in your app. Mobile Hub then provisions and configures the necessary AWS services on your behalf and creates a working quickstart app for you. This includes IAM access control policies created to save you the effort of provisioning security policies for resources such as Amazon DynamoDB tables and associating those resources with Amazon Cognito.

Get started with Mobile Hub by navigating to it in the AWS console and choosing your features.

serverless-apps-mobile-hub-console

New user sign-in options

We are happy to announce that we now support two new user sign-in options that help you authenticate your app users and provide secure access to control to AWS resources.

The Email and Password option lets you easily provision a fully managed user directory for your app in Amazon Cognito, with sign-in parameters that you configure. The SAML Federation option enables you to authenticate app users using existing credentials in your SAML-enabled identity provider, such as Active Directory Federation Service (ADFS). Mobile Hub also provides ready-to-use app flows for sign-up, sign-in, and password recovery codes that you can add to your own app.

Navigate to the User Sign-in tile in Mobile Hub to get started and choose your sign-in providers.

serverless-apps-mobile-hub-sign-in

Read more about the user sign-in feature in this blog and in the Mobile Hub documentation.

Enhanced Cloud Logic

We have enhanced the Cloud Logic feature (the right-hand tile in the top row of the above Mobile Hub screenshot), and you can now easily spin up a serverless stack. This enables you to create and test mobile cloud APIs connected to business logic functions that you develop. Previously, you could use Mobile Hub to integrate existing Lambda functions with your mobile app. With the enhanced Cloud Logic feature, you can now easily create Lambda functions, as well as API Gateway endpoints that you invoke from your mobile apps.

The feature automatically applies access control to the resulting REST APIs in API Gateway, making it easy to limit access to users who have authenticated with any of the user sign-in capabilities in Mobile Hub. Mobile Hub also allows you to test your APIs within your project and set up the permissions that your Lambda function needs for connecting to software resources behind a VPC (e.g., business applications or databases), within AWS or on-premises. Finally, you can integrate your mobile app with your cloud APIs using either the quickstart app (as an example) or the mobile app SDK; both are custom-generated to match your APIs. Here’s how it comes together:

serverless-apps-mobile-hub-enhanced-cloud-logic

Create an API

After you have chosen a sign-in provider, choose Configure more features. Navigate to Cloud Logic in your project and choose Create a new API. You can choose to limit access to your Cloud Logic API to only signed-in app users:

serverless-apps-mobile-hub-cloud-logic-api

Under the covers, this creates an IAM role for the API that limits access to authenticated, or signed-in, users.

serverless-apps-mobile-hub-cloud-logic-auth

Quickstart app

The resulting quickstart app generated by Mobile Hub allows you to test your APIs and learn how to develop a mobile UX that invokes your APIs:

serverless-apps-mobile-hub-quickstart-app

Multi-stage rollouts

To make it easy to deploy and test your Lambda function quickly, Mobile Hub provisions both your API and the Lambda function in a Development stage, for instance, https://<yoururl>/Development. This is mapped to a Lambda alias of the same name, Development. Lambda functions are versioned, and this alias is always points to the latest version of the Lambda function. This way, changes you make to your Lambda function are immediately reflected when you invoke the corresponding API in API Gateway.

When you are ready to deploy to production, you can create more stages in API Gateway, such as Production. This gives you an endpoint such as https://<yoururl>/Production. Then, create an alias of the same name in Lambda but point this alias to a specific version of your Lambda function (instead of $LATEST). This way, your Production endpoint always points to a known version of your Lambda function.

Summary

In this post, we demonstrated how to use Mobile Hub to create a secure serverless back end for your mobile app in minutes using three new features – enhanced Cloud Logic, email and password-based app sign-in, and SAML-based app sign-in. While it was just a few steps for the developer, Mobile Hub performed several underlying steps automatically–provisioning back-end resources, generating a sample app, and configuring IAM roles and sign-in providers–so you can focus your time on the unique value in your app. Get started today with AWS Mobile Hub.

Real World AWS Scalability

by Stefano Buliani | on 07 NOV 2016 | in Amazon EC2 | Permalink | Comments

This is a guest post from Linda Hedges, Principal SA, High Performance Computing.

—–
One question we often hear is, “How well will my application scale on AWS?” For high performance computing (HPC) workloads that cross multiple nodes, the cluster network is at the heart of scalability concerns.

AWS uses advanced Ethernet networking technology, which, like all things AWS, is designed for scale, security, high availability, and low cost. This network is exceptional and continues to benefit from Amazon’s rapid pace of development. Again and again, customers find that the most demanding applications run very well on AWS!

Many have speculated that highly coupled workloads require a name-brand network fabric to achieve good performance. For most applications, this is simply not the case. As with all clusters, the devil is in the details and some applications benefit from cluster tuning.

This post discusses the scalability of a representative, real-world application and provides a few performance tips for achieving excellent application performance using STAR-CCM+ as an example. For more HPC-specific information, see High Performance Computing.

Computational fluid dynamics at TLG Aerospace

TLG Aerospace, a Seattle-based aerospace engineering services company, runs most of their STAR-CCM+ computational fluid dynamics (CFD) cases on AWS. For a detailed case study describing TLG Aerospace’s experience and the results they achieved, see TLG Aerospace.

This post uses one of their CFD cases as an example to understand AWS scalability. By leveraging Amazon EC2 Spot Instances, which allow customers to purchase unused capacity at significantly reduced rates, TLG Aerospace consistently achieves an 80% cost savings compared to their previous cloud and on-premises HPC cluster options. TLG Aerospace experiences solid value, terrific scale-up, and nearly limitless case throughput—all with no queue wait!

Scale-up

HPC applications such as CFD depend heavily on the application’s ability to scale compute tasks efficiently in parallel across multiple compute resources. Parallel performance is often evaluated by determining an application’s scale-up. Scale-up is a function of the number of processors used and is defined as the time it takes to complete a run on one processor, divided by the time it takes to complete the same run on the number of processors used for the parallel run.

As an example, consider an application with a time to completion, or turn-around time of 32 hours when run on one processor. If the same application runs in one hour when run on 32 processors, then the scale-up is 32 hours of time on 1 processor / 1 hour time on 32 processors, or equal to 32 for 32 processes. Scaling is considered to be excellent when the scale-up is close to or equal to the number of processors on which the application is run.

If the same application took 8 hours to complete on 32 processors, it would have a scale-up of only 4: 32 (time on one processor) / 8 (time to complete on 32 processors). A scale-up of 4 on 32 processors is considered to be poor.

Strong scaling vs. weak scaling

In addition to characterizing the scale-up of an application, scalability can be further characterized as “strong” or “weak”. Note that the term “weak”, as used here, does not mean inadequate or bad but is a technical term facilitating the description of the type of scaling that is sought.

Strong scaling offers a traditional view of application scaling, where a problem size is fixed and spread over an increasing number of processors. As more processors are added to the calculation, good strong scaling means that the time to complete the calculation decreases proportionally with increasing processor count.

In comparison, weak scaling does not fix the problem size used in the evaluation, but purposely increases the problem size as the number of processors also increases. The ratio of the problem size to the number of processors on which the case is run is held constant. For a CFD calculation, problem size most often refers to the size of the grid or mesh for a similar configuration.

An application demonstrates good weak scaling when the time to complete the calculation remains constant as the ratio of compute effort to the number of processors is held constant. Weak scaling offers insight into how an application behaves with varying case size.

Scale-up as a function of increasing processor count is shown in Figure 1 for the STAR-CCM+ case data provided by TLG Aerospace. This is a demonstration of “strong” scalability. The blue line shows what ideal or perfect scalability looks like. The purple triangles show the actual scale-up for the case as a function of increasing processor count. Excellent scaling is seen to well over 400 processors for this modest-sized 16M cell case, as evidenced by the closeness of these two curves. This example was run on Amazon EC2 c3.8xlarge instances, each an Intel E5-2680, providing either 16 cores or 32 Hyper-Threading processors using Intel Hyper-Threading Technology (HTT).

Figure 1: Strong Scaling Demonstrated for a 16M Cell STARCCM+ CFD Calculation

Threads vs. cores

AWS customers can choose to run their applications on either threads or cores. For an application like STAR-CCM+, excellent linear scaling can be seen when using either threads or cores, though we always recommend testing specific cases and applications.

For this example, threads were chosen as the processing basis. Running on threads offered a few percentage points in performance improvement when compared to running the same case on cores. Note that the number of available cores is equal to half of the number of available threads.

Processor counts

The scalability of real-world problems is directly related to the ratio of the compute effort per-core to the time required to exchange data across the network. The number of grid cells or mesh size of a CFD case provides a strong indication of how much computational effort is required for a solution. Thus, larger cases scale to even greater processor counts than for the modest sized case discussed here.

STAR-CCM+ has been shown to demonstrate exceptional “weak” scaling on AWS. That’s not shown here, though weak scaling is reflected in Figure 2 by plotting the cells per processor on the horizontal axis. The purple line in Figure 2 shows scale-up as a function of grid cells per processor. The vertical axis for scale-up is on the left-hand side of the graph as indicated by the purple arrow. The green line in Figure 2 shows efficiency as a function of grid cells per processor. The vertical axis for efficiency is shown on the right side of the graph and is indicated with a green arrow. Efficiency is defined as the scale-up divided by the number of processors used in the calculation.

Figure 2: Scale-up and Efficiency as a Function of Cells per Processor

Weak scaling is evidenced by considering the number of grid cells per processor as a measure of compute effort. Holding the grid cells per processor constant while increasing total case size demonstrates weak scaling. Weak scaling is not shown here, because only one CFD case is used.

Efficiency

Fewer grid cells per processor means reduced computational effort per processor. Maintaining efficiency while reducing cells per processor demonstrates the excellent strong scalability of STAR-CCM+ on AWS.

Efficiency remains at about 100% between approximately 250,000 grid cells per thread (or processor) and 100,000 grid cells per thread. Efficiency starts to fall off at about 100,000 grid cells per thread. An efficiency of at least 80% is maintained until 25,000 grid cells per thread. Decreasing grid cells per processor leads to decreased efficiency because the total computational effort per processor is reduced. Note that the perceived ability to achieve more than 100% efficiency (here, at about 150,000 cells per thread) is common in scaling studies, is case-specific, and often related to smaller effects such as timing variation and memory caching.

Turn-around time and cost

Plots of scale-up and efficiency offer an understanding about how a case or application scales. The bottom line, though, is that what really matters to most HPC users is case turn-around time and cost. A plot of turn-around time versus CPU cost for this case is shown in Figure 3. As the number of threads are increased, the total turn-around time decreases. But as the number of threads increases, the inefficiency also increases, which leads to increased costs. The cost shown is based on a typical Spot price for the c3.8xlarge and only includes the computational costs. Small costs are also incurred for data storage. Note that the Spot market price varies from day to day.

Figure 3: Cost for per Run Based on Spot Pricing ($0.35 per hour for c3.8xlarge) as a function of Turn-around Time

Minimum cost and turn-around time were achieved with approximately 100,000 cells per thread. Many users choose a cell count per thread to achieve the lowest possible cost. Others may choose a cell count per thread to achieve the fastest turn-around time.

If a run is desired in 1/3rd the time of the lowest price point, it can be achieved with approximately 25,000 cells per thread. (Note that many users run STAR-CCM+ with significantly fewer cells per thread than this.) While this increases the compute cost, other concerns—such as license costs or schedules—can be overriding factors. For this 16M cell case, the added inefficiency results in an increase in run price from $3 to $4 for computing. Many find the reduced turn-around time well worth the price of the additional instances.

Cluster tuning tips

As with any cluster, good performance requires attention to the details of the cluster setup. While AWS allows for the quick set up and take down of clusters, performance is affected by many of the specifics in that setup. This post provides some examples.

Placement groups

On AWS, a placement group is a grouping of instances within a single Availability Zone that allow for low latency between the instances. Placement groups are recommended for all applications where low latency is a requirement. A placement group was used to achieve the best performance from STAR-CCM+. For more information, see Placement Groups in the Amazon EC2 User Guide for Linux Instances.

Amazon Linux OS

Amazon Linux is a version of Linux maintained by Amazon. The distribution is designed to provide a stable, secure, and highly performant environment. Amazon Linux is optimized to run on AWS and offers excellent performance for running HPC applications. For the case presented here, the operating system used was Amazon Linux. Other Linux distributions are also performant. However, we strongly recommend that for Linux HPC applications, you use a minimum of the version 3.10 Linux kernel, to be sure of using the latest Xen libraries. For more information, see Amazon Linux AMI.

Amazon EBS storage

Amazon Elastic Block Store (Amazon EBS) is a persistent, block-level storage device often used for cluster storage on AWS. EBS provides reliable block-level storage volumes that can be attached (and removed) from an Amazon EC2 instance. A standard EBS General Purpose SSD (gp2) volume is all that is required to meet the needs of STAR-CCM+, and was used for this post. Other HPC applications may require faster I/O to prevent data writes from being a bottleneck to turn-around speed but also, many HPC applications only require the less expensive throughput optimized EBS volumes. For these applications, other storage options exist. For more information, see Storage.

Intel Hyper-Threading Technology (HTT)

As mentioned previously, STAR-CCM+, like many other CFD solvers, runs well on both threads and cores. HTT can improve the performance of some MPI applications depending on the application, case, and size of the workload allocated to each thread; it may also slow performance. The one-size-fits-all nature of the static cluster compute environments means that most HPC clusters disable HTT.

Generally, computationally intensive workloads run best on cores while those that are I/O bound run best on threads. Again, a few percentage points increase in performance was discovered for this case, by running with threads. If there is no time to evaluate the effect of HTT on case performance, then we recommend that HTT be disabled. When disabled, it is important to bind the core to designated CPU, also known as processor or CPU affinity. It almost universally improves performance over unpinned cores for computationally intensive workloads.

Time Stamp Counter

Occasionally, an application includes frequent time measurement in the code; perhaps this is done for performance tuning. Under these circumstances, performance can be improved by setting the clock source to the TSC (Time Stamp Counter). This tuning was not required for this application but is mentioned here for completeness.

Summary

When you evaluate an application, we recommend using a meaningful, real world use case. A case that is too large or small won’t reflect the performance and scalability achievable in everyday operation. The only way you’ll know positively how an application will perform on AWS is to try it!

AWS offers solid strong scaling and exceptional weak scaling. Excellent performance can be achieved on AWS for most applications. In addition to low cost and quick turn-around time, important considerations for HPC also include throughput and availability. AWS offers nearly limitless throughput, security, cost-savings, and high-availability making queues a “thing of the past”. A long queue wait makes for a long case turn-around time, regardless of the scale.

If you have questions or suggestions, please comment below.

Running Swift Web Applications with Amazon ECS

by Chris Barclay | on 07 NOV 2016 | in Amazon ECS | Permalink | Comments

This is a guest post from Asif Khan about how to run Swift applications on Amazon ECS.

—–

Swift is a general-purpose programming language built using a modern approach to safety, performance, and software design patterns. A goal for Swift is to be the best language for uses ranging from systems programming, to mobile and desktop applications, scaling up to cloud services. As a developer, I am thrilled with the possibility of a homogeneous application stack and being able to leverage the benefits of Swift both on the client and server side. My code becomes more concise, and is more tightly integrated to the iOS environment.

In this post, I provide a walkthrough on building a web application using Swift and deploying it to Amazon ECS with an Ubuntu Linux image and Amazon ECR.

Overview of container deployment

Swift provides an Ubuntu version of the compiler that you can use. You still need a web server, a container strategy, and an automated cluster management with automatic scaling for traffic peaks.

There are some decisions to make in your approach to deploy services to the cloud:

HTTP server
Choose a HTTP server which supports Swift. I found Vapor to be the easiest. Vapor is a type-safe web framework for Swift 3.0 that works on iOS, MACOS, and Ubuntu. It is very simple and easy to deploy a Swift application. Vapor comes with a CLI that will help you create new Vapor applications, generate Xcode projects and build them, as well as deploy your applications to Heroku or Docker. Another Swift webserver is Perfect. In this post, I use Vapor as I found it easier to get started with.

Tip: Join the Vapor slack group; it is super helpful. I got answers on a long weekend which was super cool.

Container model
Docker is an open-source technology that that allows you to build, run, test, and deploy distributed applications inside software containers. It allows you to package a piece of software in a standardized unit for software development, containing everything the software needs to run: code, runtime, system tools, system libraries, etc. Docker enables you to quickly, reliably, and consistently deploy applications regardless of environment.
In this post, you’ll use Docker, but if you prefer Heroku, Vapor is compatible with Heroku too.
Image repository
After you choose Docker as the container deployment unit, you need to store your Docker image in a repository to automate the deployment at scale. Amazon ECR is a fully-managed Docker registry and you can employ AWS IAM policies to secure your repositories.
Cluster management solution
Amazon ECS is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.

With ECS, it is very easy to adopt containers as a building block for your applications (distributed or otherwise) by skipping the need for you to install, operate, and scale your own cluster infrastructure. Using Docker container within ECS provides flexibility to schedule long-running applications, services, and batch processes. ECS maintains application availability and allows you to scale containers.

To put it all together, you have your Swift web application running in a HTTP server (Vapor), deployed on containers (Docker) with images are stored on a secure repository (ECR) with automated cluster management (ECS) to scale horizontally.

Prepare an AWS Account

If you don’t already have an AWS account, create one at http://aws.amazon.com by following the on-screen instructions.
Use the region selector in the navigation bar to choose the AWS Region where you want to deploy Swift web applications on AWS.
Create a key pair in your preferred region.

Walkthrough

The following steps are required to set up your first web application written in Swift and deploy it to ECS:

Download and launch an instance of the AWS CloudFormation template. The CloudFormation template installs Swift, Vapor, Docker, and the AWS CLI.
SSH into the instance.
Download the vapor example code
Test the Vapor web application locally.
Enhance the Vapor example code to include a new API.
Push your code to a code repository
Create a Docker image of your code.
Push your image to Amazon ECR.
Deploy your Swift web application to Amazon ECS.

Detailed steps

Download the CloudFormation template and spin up an EC2 instance. The CloudFormation has Swift , Vapor, Docker, and git installed and configured. To launch an instance, launch the CloudFormation template from here.
SSH into your instance:
```
ssh –i ec2-user@<bastion host ip>
```
Download the Vapor example code – this code helps deploy the example you are using for your web application:
```
git clone https://github.com/awslabs/ecs-swift-sample-app.git
```
Test the Vapor application locally:
1. Build a Vapor project:
```
cd ~/ecs-swift-sample-app/example \
vapor build
```
2. Run the Vapor project:
```
vapor run serve --port=8080
```
3. Validate that server is running (in a new terminal window):
```
ssh -i ubuntu@ curl localhost:8080
```
Enhance the Vapor code:
1. Follow the guide to add a new route to the sample application: https://Vapor.readme.io/docs/hello-world
2. Test your web application locally:
```
vapor run serve --port=8080
curl http://localhost/hello.
```
Commit your changes and push this change to your GitHub repository:
```
git add –all
git commit –m
git push
```

Build a new Docker image with your code:

docker build -t swift-on-ecs \
--build-arg SWIFT_VERSION=DEVELOPMENT-SNAPSHOT-2016-06-06-a \
--build-arg REPO_CLONE_URL=https://github.com/awslabs/ecs-swift-sample-app.git/ \
~/ ecs-swift-sample-app/example

Upload to ECR: Create an ECR repository and push the image following the steps in Getting Started with Amazon ECR.
Create a ECS cluster and run tasks following the steps in Getting Started with Amazon ECS:
1. Be sure to use the full registry/repository:tag naming for your ECR images when creating your task. For example, aws_account_id.dkr.ecr.us-east-1.amazonaws.com/my-web-app:latest.
2. Ensure that you have port forwarding 8080 set up.
You can now go to the container, get the public IP address, and try to access it to see the result.
1. Open your running task and get the URL:
2. Open the public URL in a browser:

Your first Swift web application is now running.

At this point, you can use ECS with Auto Scaling to scale your services and also monitor them using CloudWatch metrics and events.

Conclusion

If you want to leverage the benefits of Swift, you can use Vapor as the web container with Amazon ECS and Amazon ECR to deploy Swift web applications at scale and delegate the cluster management to Amazon ECS.

There are many interesting things you could do with Swift beyond this post. To learn more about Swift, see the additional Swift libraries and read the Swift documentation.

If you have questions or suggestions, please comment below.

Ad Hoc Big Data Processing Made Simple with Serverless MapReduce

by Bryan Liston | on 04 NOV 2016 | in AWS Lambda | Permalink | Comments

Sunil Mallya
Solutions Architect

Big data processing solutions have been using AWS Lambda more lately; customers have been creating solutions such as building metadata indexes for Amazon S3 using Lambda and Amazon DynamoDB and stream processing of data in S3. In this post, I discuss a serverless architecture to run MapReduce jobs using Lambda and S3.

Big data AWS services

Lambda launched in 2015, enabling customers to execute code on demand without any dedicated infrastructure. Since then, customers have successfully used Lambda for various use cases like event-driven processing for data ingested into S3 or Amazon Kinesis, web API backends, and producer/consumer architectures, among others. Lambda has emerged as a key component in building serverless architectures for these architecture paradigms.

S3 integrates directly into other AWS services, such as providing an easy export facility into Amazon Redshift and Amazon Elasticsearch Service, and providing an underlying file system (EMRFS) for Amazon EMR, making it an ideal data lake in the AWS ecosystem.

Amazon Redshift, EMR and the Hadoop ecosystem offer comprehensive solutions for big data processing. While EMR makes it easy, fast and cost-effective to run Hadoop clusters, it still requires provisioning servers, as well as knowledge of Hadoop and its components.

Serverless MapReduce overview

I wanted to extend some of these solutions and present a serverless architecture along with a customizable framework that enables customers to run ad hoc map reduce jobs. Apart from the benefit of not having to manage any servers, this framework was significantly cheaper and, in some cases, faster than existing solutions when running the well-known big data processing Amplab benchmark.

This framework allows you to process big data with no operational effort and no prior Hadoop knowledge. While Hadoop is the most popular big data processing framework today, it does have a steep learning curve given the number of components and their inherent complexity. By minimizing the number of components in the framework and abstracting the infrastructure management, it simplifies data processing for developers or data scientists. They can focus on modeling the data processing problem and deriving value from the data stored in S3.

In addition to reduced complexity, this solution is much cheaper than existing solutions for ad hoc MapReduce workloads. Given that the solution is serverless, customers pay only when the MapReduce job is executed. The cost for the solution is the aggregate usage cost of Lambda and S3. Given that both the services are on-demand, you can compute the cost per query, a feature that's unique to the solution. This is extremely useful as you can now budget your data processing needs precisely to the query.

For customers who want enhanced security, they can process the data in a VPC by configuring the Lambda functions to access resources in a VPC and creating a VPC endpoint for S3. There are no major performance implications of running multiple variants of the same job or different jobs on the same dataset concurrently. The Lambda function resources are independent, thus allowing for fast iterations and development. This technology is really exciting for our customers, as it enables a truly pay-for-what-you-use cost effective and secure model for big data processing.

Reference architecture for serverless MapReduce

The goals are to:

Abstract infrastructure management
Get close to a "zero" setup time
Provide a pay-per-execution model for every job
Be cheaper than other data processing solutions
Enable multiple executions on the same dataset

The architecture is composed of three main Lambda functions:

Mapper
Reducer
Coordinator

The MapReduce computing paradigm requires maintaining state, so architecturally you require a coordinator or a scheduler to connect the map phase of the processing to the reduce phase. In this architecture, you use a Lambda function as a coordinator to make this completely serverless. The coordinator maintains all its state information by persisting the state data in S3.

Execution workflow:

The workflow starts with the invocation of the driver script that reads in the config file, which includes the mapper and reducer function file paths and the S3 bucket or folder path.
The driver function finds the relevant objects in S3 to process by listing the keys and matching by prefix in the S3 bucket. The keys are aggregated to created batches, which are then passed to the mapper. The batch size is determined by a simple heuristic that tries to achieve maximum parallelism while optimizing the data fit based on the mapper memory size.
Mapper, reducer, and coordinator Lambda functions are created and the code is uploaded.
A S3 folder called job folder is created as a temporary workspace for the current job and is configured as an event source for the coordinator.
The mappers write their outputs to the job folder; after all the mappers finish, the coordinator uses the aforementioned logic to create batches and invoke the reducers.
The coordinator is notified through the S3 event source mapping when each of the reducers end and continues to create subsequent stages of reducers until a single reduced output is created.

The following diagram shows the overall architecture:

Getting started with serverless MapReduce

In this post, I show you how to build and deploy your first serverless MapReduce job with this framework for data stored in S3. The code used in this post, along with detailed instructions about how to set up and run the application, is available in the awslabs lambda-refarch-mapreduce GitHub repo.

Use the dataset generated by Intel's Hadoop benchmark tools and data sampled from the Common Crawl document corpus also used by the Amplab benchmark.

For the first job, you compute an aggregation (i.e., sum of ad revenue for every source IP address) on the dataset of size 25.4GB and 155 million individual rows.

Dataset:

s3://big-data-benchmark/pavlo/text/1node/

Schema in CSV:

Each row of the Uservisits dataset is composed of the following:

sourceIP VARCHAR(116)
destURL VARCHAR(100)
visitDate DATE
adRevenue FLOAT
userAgent VARCHAR(256)
countryCode CHAR(3)
languageCode CHAR(6)
searchWord VARCHAR(32)
duration INT

SQL representation of the intended operation:

SELECT SUBSTR(sourceIP, 1, 8), SUM(adRevenue) FROM uservisits GROUP BY SUBSTR(sourceIP, 1, 8)

Prerequisites

You need the following prerequisites for working with this serverless MapReduce framework:

AWS account
S3 bucket
Lambda execution role
IAM credentials with access to create Lambda functions and list S3 buckets

Limits

In its current incarnation, this framework is best suited for ad hoc processing of data in S3. For sustained usage and time sensitive workloads, Amazon Redshift or EMR may be better suited. The framework has the following limits:

By default, each account has a concurrent Lambda execution limit of 100. This is a soft limit and can be increased by opening up a limit increase support ticket.
Lambda currently has a maximum container size limit of 1536 MB.

Components

The application has the following components:

Driver script and config file
Mapper Lambda function
Reducer Lambda function
Coordinator Lambda function

Driver script and config file

The driver script creates and configures the mapper, reducer, and coordinator Lambda functions in your account based on the config file. Here is an example configuration file that the driver script uses.

  sourceBucket: "big-data-benchmark",
  jobBucket: "my-job-bucket",
  prefix: "pavlo/text/1node/uservisits/",
  region: "us-east-1",
  lambdaMemory: 1536,
  concurrentLambdas: 100,
  mapper: {
        name: "mapper.py",
        handler: "mapper.handler",
        zip: "mapper.zip"
    },
  reducer:{
        name: "reducer.py",
        handler: "reducer.handler",
        zip: "reducer.zip"
    },
  reducerCoordinator:{
        name: "reducerCoordinator.py",
        handler: "reducerCoordinator.handler",
        zip: "reducerCor.zip"
    }

Mapper Lambda function

This is where you perform your map logic; the data is streamed line by line into your mapper function. In this function, you map individual records to a value or as an optimization; also, you need to perform the first reduce step especially when computing aggregations. This is efficient given that you may be reading from multiple input sources in the mapper. In this example, the mapper maps the sourceIP on to adRevenue and stores in a dictionary a running total of the adRevenue of every sourceIP.

    # Download and process all keys

    for key in src_keys:
        response = s3_client.get_object(Bucket=src_bucket, Key=key)
        contents = response['Body'].read()
        for line in contents.split('\n')[:-1]:
            line_count +=1
            try:
                data = line.split(',')
                srcIp = data[0][:8]
                if srcIp not in totals:
                    totals[srcIp] = 0
                totals[srcIp] += float(data[3])
            except Exception, e:
                print e

Reducer Lambda function

The mapper and reducer functions look identical structurally, but perform different operations on the data. The reducer keeps a dictionary of aggregate sums of adRevenue of every sourceIP. The reducer is also responsible for the termination of the job. When the final reducer runs, it then reduces the intermediate results into a single file stored in the job bucket.

    # Download and process all mapper output

    for key in reducer_keys:
        response = s3_client.get_object(Bucket=job_bucket, Key=key)
        contents = response['Body'].read()
        try:
            for srcIp, val in json.loads(contents).iteritems():
                line_count +=1
                if srcIp not in results:
                    results[srcIp] = 0
                results[srcIp] += float(val)
        except Exception, e:
            print e

Coordinator Lambda function

The coordinator function keeps track of the job state with the job bucket as a Lambda event source. It is invoked every time a mapper or reducer finishes. After all the mappers finish, it then invokes the reducers to compute the final output.

The pseudo code for the coordinator looks like the following:

If all mappers are done:
    If currently in the reduce phase:
            number_of_reducers = compute_number_of_reducers(previous_reducer_step_outputs)
Else:
    number_of_reducers = compute_number_of_reducers(mapper_outputs)
    If  number_of_reducers == 1:
       Invoke single reducer and write the results to S3
Job done;
    Else create event source for reducer
Else:
    Return

The coordinator doesn't need wait for all the mappers to finish in order to start the reducers, but for simplicity, the first version of the framework chooses to wait for all mappers.

Running the job

The driver function is the interface to start the job. It reads the job details like the mapper and reducer code source to create the Lambda functions in your account for the serverless MapReduce job.

Execute the following command:

$ python driver.py

Intermediate outputs from the mappers and reducers are stored in the specified job bucket and the final result is stored as JobBucket/JobID/result. The contents of the job bucket look like the following:

smallya$ aws s3 ls s3://JobBucket/py-bl-1node-2 --recursive --human-readable --summarize

2016-09-26 15:01:17   69 Bytes py-bl-1node-2/jobdata
2016-09-26 15:02:04   74 Bytes py-bl-1node-2/reducerstate.1
2016-09-26 15:03:21   51.6 MiB py-bl-1node-2/result
2016-09-26 15:01:46   18.8 MiB py-bl-1node-2/task/
….

smallya$ head –n 3 result

67.23.87,5.874290244999999
30.94.22,96.25011190570001
25.77.91,14.262780186000002

Cost Analysis

The cost for this job is 2.49 cents, which processed over 25 GB of data and took less than 2 minutes. The majority of the cost can be attributed to the Lambda components and the mapper function in particular, given that the map phase takes the longest and is the most resource-intensive.

A component cost breakdown for the example above is plotted in the following chart.

The cost model is shown to scale almost linearly when the same job is run for a bigger dataset (126.8 GB, 775 million rows) for the Amplab benchmark (more details in the awslabs lambda-refarch-mapreduce GitHub repo), costing around 11 cents and executing in 3.5 minutes.

Summary

In this post, I showed you how to build a simple MapReduce task using a serverless framework for data stored in S3. If you have ad hoc data processing workloads, the cost effectiveness, speed, and price-per-query model makes it very suitable.

The code has been made available in the awslabs lambda-refarch-mapreduce GitHub repo . You can modify or extend this framework to build more complex applications for your data processing needs.

If you have questions or suggestions, please comment below.

Service Discovery for Amazon ECS Using DNS

by Chris Barclay | on 20 OCT 2016 | in Amazon ECS | Permalink | Comments

My colleagues Peter Dalbhanjan and Javier Ros sent a nice guest post that describes DNS-based service discovery for Amazon ECS.

——

Containers are generating a lot of interest due to benefits such as portability and speed of deployment. Containers are also a good fit for microservices as they offer a thin, modular, self-contained environment that enables rapid innovation in the lifecycle of an application.

One requirement for the microservices design pattern is a reliable framework to describe the relationship between different microservices. For example, if a portal application needs to communicate with a weather service to provide data, it needs to know how to connect with that service. This underlying framework is defined as service discovery.

Service discovery offers features such as the following:

application reachability (where is my application, how can I reach it)
health checks (is this application healthy)
updates (how do I know when a new application comes online)
metadata for application configuration such as environment variables

There are several third-party approaches such as Consul.io, Weave, Netflix Eureka but running a third-party tool to manage service discovery presents its own set of challenges and increases operational complexity.

Recently, we proposed a reference architecture for ELB-based service discovery that uses Amazon CloudWatch Events and AWS Lambda to register the service in Amazon Route 53 and uses Elastic Load Balancing functionality to perform health checks and manage request routing. An ELB-based service discovery solution works well for most services, but some services do not need a load balancer.

This post focuses on using a DNS-based approach for service discovery that leverages Amazon Route 53 private hosted zones for the remainder of use cases that don’t require a load balancer. By running a simple agent (ecssd_agent.go) on Amazon ECS container instances, customers do not need to worry about the management and maintenance of service discovery component. The agent receives all Docker events natively and registers the service into Route 53 private hosted zones.

As containers start and stop, the agent updates Route 53 DNS records. This solution also works if customers use their own third-party schedulers on top of ECS, such as Apache Mesos or Marathon. The agent is written in Go, has a minimal footprint, and is available on GitHub under the Apache license. We encourage customers to modify the code as needed, and provide feedback.

Architecture

In this architecture, there are two ECS container instances running in an ECS cluster with ecssd_agent.go running in the background. This agent is started automatically using upstart configured with the ecssd_agent.conf startup script. The agent listens to Docker events natively; it registers the service name and each task’s metadata, such as IP address and ports info, into the Route 53 private hosted zone. The agent also deregisters the service and its SRV record as soon as the Docker container is stopped, so it detects failures as fast as they happen.

Clients can access other applications via environment variables defined in the container configuration. Each service that you want available for service discovery requires an environment variable in the task definition. The name of the variable should be SERVICE_<port>_NAME, where is the port where your service is going to listen inside the container, and the value is the name of the microservice.

Below is the task definition of the Calc service:

"CalcDefinition": {
            "Type" : "AWS::ECS::TaskDefinition",
            "DependsOn": "DockerBuildWaitCondition",
            "Properties" : {
                "ContainerDefinitions": [
                    {
                        "Name": "calc-service",
                        "Image": { "Fn::Join" : ["", [
                            { "Ref" : "AWS::AccountId" }, ".dkr.ecr.", { "Ref": "AWS::Region" }, ".amazonaws.com/calc-demo-service:latest"
                        ]]},
                        "Cpu": "100",
                        "Memory": "100",
                        "PortMappings": [
                            {
                                "ContainerPort": 8081
                            }
                        ],
                        "Essential": true,
                        "Environment": [
                            {
                                "Name": "CALC_USERNAME",
                                "Value": "admin"
                            },
                            {
                                "Name": "CALC_PASSWORD",
                                "Value": "password"
                            },
                            {
                                "Name": "SERVICE_8081_NAME",
                                "Value": "calc"
                            }
                        ]
                    }
                ]
            }
        },

Specifying SERVICE_8081_NAME under Environment variables registers calc-service as a Route 53 SRV record. If you run a DNS query with the service name (calc-service), Route 53 responds with the IP address and port number associated with the SRV record. If a record has more than one value, Route 53 responds with a different response based on the built-in, round-robin algorithm.

You can specify the port in the portMappings property of your task definition. In this property, you can specify the port for the container (the port where the application is listening) and the port for the host. We recommend leaving the host port to be assigned dynamically so that you can launch more than one task of the same type per server.

To respond to container instance failures and unhealthy containers, customers can use the Lambda function (lambda_health_check.py) to remove records from Route 53. You can schedule the function to run every five minutes.

If there is an EC2 instance failure, the Lambda function recognizes the failure the next time it runs and performs a cleanup of the associated Route 53 records. For health checks, the function reads all the SRV records in Route 53 and performs HTTP Get against those containers (http://serverip:port/health). If the response code is different from 200, then it stops the ECS task and removes the associated Route 53 record. This is just an example of how to perform health checks; customers can extend this capability to perform custom health checks as needed.

This is a simple seamless implementation of service discovery for Docker containers in AWS. Customers can run any number of services without managing the complexity of running service discovery component.

Template implementation

You could leverage the CloudFormation template to build the infrastructure and visualize the solution in action. Here are the details of the CloudFormation template.

Service_Discovery_Using_DNS_Base.template

Creates a VPC with two subnets, route tables, Internet gateway, and security groups
Creates IAM roles
Creates an ECS cluster
Creates an Auto Scaling group and launches the configuration for the ECS cluster
Creates an Amazon Route 53 private hosted zone
Installs ecssd_agent on the container instances
Creates ECR repositories for the microservices applications (Portal, Time, Calc)
Builds the microservices applications (Portal, Time, Calc) and push the images to ECR
Registers task definitions and runs the microservices applications

CloudFormation builds the microservices application containers using an EC2 instance with a ‘docker builder’ tag. This EC2 instance downloads the microservices application code from the GitHub repo, builds the applications, and pushes the images to their corresponding ECR repositories. The permission required to push ECR images is granted by the AmazonEC2ContainerRegistryPowerUser IAM policy. For updating Route 53 records, all the EC2 instances need to have the AmazonRoute53FullAccess IAM policy.

The microservices application is made of three containers: Portal, Time, and Calculator apps. After CloudFormation completes the deployment, you can choose Outputs, PortalURL to see some examples of service discovery:

Portal: A front-end web service to the other two microservices applications. Review the source code for portal (portal.go) to see how it references other two microservice endpoints and use DNS for communication.
Time: A simple time service that returns the current date and time in the required format. Specify the format using the go standard. For example, you can write “2 Jan”, “15:04”, or “Jan -> 15”. You can use any combination of time in a string format. Enter “15:04 Jan 2” and choose Add to receive an appropriate response.
Calc: A simple calculator that offers addition, subtraction, and multiplication. Enter (2+6)*3 and receive a response with calculated results.

To take this further, stop either a Time or Calc container. You will see that the Route 53 record associated with the container gets deleted as soon as the container stops. Similarly, when the ECS service kicks in a new container, a new Route53 record is created automatically.

Cleanup

To clean up, delete the ECR repositories and Route 53 private hosted zone first. After this, deleting the CloudFormation stack deletes all the components involved in the implementation.
To keep the infrastructure in place for further testing, you can just delete the EC2 builder (with the ‘docker builder’ tag) as it is only responsible for creating and pushing Docker images to ECR.

Conclusion

Service discovery is a key component of a microservice architecture. By installing a simple agent on EC2 container instances, customers can take advantage of running service discovery via Route 53 DNS with less hassle and a worry-free implementation. You don’t need to maintain additional infrastructure or worry about added costs for running a service discovery solution.

If you have questions or suggestions, please comment below.

Fleet Management Made Easy with Auto Scaling

by Chris Barclay | on 20 OCT 2016 | Permalink | Comments

If your application runs on Amazon EC2 instances, then you have what’s referred to as a ‘fleet’. This is true even if your fleet is just a single instance. Automating how your fleet is managed can have big pay-offs, both for operational efficiency and for maintaining the availability of the application that it serves. You can automate the management of your fleet with Auto Scaling, and the best part is how easy it is to set up!

There are three main functions that Auto Scaling performs to automate fleet management for EC2 instances:

Monitoring the health of running instances
Automatically replacing impaired instances
Balancing capacity across Availability Zones

In this post, we describe how Auto Scaling performs each of these functions, provide an example of how easy it is to get started, and outline how to learn more about Auto Scaling.

Monitoring the health of running instances

Auto Scaling monitors the health of all instances that are placed within an Auto Scaling group. Auto Scaling performs EC2 health checks at regular intervals, and if the instance is connected to an Elastic Load Balancing load balancer, it can also perform ELB health checks. Auto Scaling ensures that your application is able to receive traffic and that the instances themselves are working properly. When Auto Scaling detects a failed health check, it can replace the instance automatically.

Automatically replacing impaired instances

When an impaired instance fails a health check, Auto Scaling automatically terminates it and replaces it with a new one. If you’re using an Elastic Load Balancing load balancer, Auto Scaling gracefully detaches the impaired instance from the load balancer before provisioning a new one and attaches it back to the load balancer. This is all done automatically, so you don’t need to respond manually when an instance needs replacing.

Balancing capacity across Availability Zones

Balancing resources across Availability Zones is a best practice for well-architected applications, as this greatly increases aggregate system availability. Auto Scaling automatically balances EC2 instances across zones when you configure multiple zones in your Auto Scaling group settings. Auto Scaling always launches new instances such that they are balanced between zones as evenly as possible across the entire fleet. What’s more, Auto Scaling only launches into Availability Zones in which there is available capacity for the requested instance type.

Getting started is easy!

The easiest way to get started with Auto Scaling is to build a fleet from existing instances. The AWS Management Console provides a simple workflow to do this: right-click on a running instance and choose Instance Settings, Attach to Auto Scaling Group.

You can then opt to attach the instance to a new Auto Scaling group. Your instance is now being automatically monitored for health and will be replaced if it becomes impaired. If you configure additional zones and add more instances, they will be spread evenly across Availability Zones to make your fleet more resilient to unexpected failures.

Diving deeper

While this example is a good starting point, you may want to dive deeper into how Auto Scaling can automate the management of your EC2 instances.

The first thing to explore is how to automate software deployments. AWS Elastic Beanstalk is a popular and easy-to-use solution that works well for web applications. AWS CodeDeploy is a good solution for fine-grained control over the deployment process. If your application is based on containers, then Amazon EC2 Container Service (Amazon ECS) is something to consider. You may also want to look into AWS Partner solutions such as Ansible and Puppet. One common strategy for deploying software across a production fleet without incurring downtime is blue/green deployments, to which Auto Scaling is particularly well-suited.

These solutions are all enhanced by the core fleet management capabilities in Auto Scaling. You can also use the API or CLI to roll your own automation solution based on Auto Scaling. The following learning path will help you to explore the service in more detail.

Launch configurations
Lifecycle hooks
Fleet size
Automatic process control
Scheduled scaling

Launch configurations

Launch configurations are the key to how Auto Scaling launches instances. Whenever an Auto Scaling group launches a new instance, it uses the currently associated launch configuration as a template for the launch. In the example above, Auto Scaling automatically created a launch configuration by deriving it from the attached instance. In many cases, however, you create your own launch configuration. For example, if your software environment is baked into an Amazon Machine Image (AMI), then your launch configuration points to the version that you want Auto Scaling to deploy onto new instances.

Lifecycle hooks

Lifecycle hooks let you take action before an instance goes into service or before it gets terminated. This can be especially useful if you are not baking your software environment into an AMI. For example, launch hooks can perform software configuration on an instance to ensure that it’s fully prepared to handle traffic before Auto Scaling proceeds to connect it to your load balancer. One way to do this is by connecting the launch hook to an AWS Lambda function that invokes RunCommand on the instance.

Terminate hooks can be useful for collecting important data from an instance before it goes away. For example, you could use a terminate hook to preserve your fleet’s log files by copying them to an Amazon S3 bucket when instances go out of service.

Fleet size

You control the size of your fleet using the minimum, desired, and maximum capacity attributes of an Auto Scaling group. Auto Scaling automatically launches or terminates instances to keep the group at the desired capacity. As mentioned before, Auto Scaling uses the launch configuration as a template for launching new instances in order to meet the desired capacity, doing so such that they are balanced across configured Availability Zones.

Automatic process control

You can control the behavior of Auto Scaling’s automatic processes such as health checks, launches, and terminations. You may find the AZRebalance process of particular interest. By default, Auto Scaling automatically terminates instances from one zone and re-launches them into another if the instances in the fleet are not spread out in a balanced manner.
You may want to disable this behavior under certain conditions. For example, if you’re attaching existing instances to an Auto Scaling group, you may not want them terminated and re-launched right away if that is required to re-balance your zones. Note that Auto Scaling always replaces impaired instances with launches that are balanced across zones, regardless of this setting. You can also control how Auto Scaling performs health checks, launches, terminations, and more.

Scheduled scaling

Scheduled scaling is a simple tool for adjusting the size of your fleet on a schedule. For example, you can add more or fewer instances to your fleet at different times of the day to handle changing customer traffic patterns. A more advanced tool is dynamic scaling, which adjusts the size of your fleet based on Amazon CloudWatch metrics.

Summary

Auto Scaling can bestow important benefits to cloud applications by automating the management of fleets of EC2 instances. Auto Scaling makes it easy to monitor instance health, automatically replace impaired instances, and spread capacity across multiple Availability Zones.

If you already have a fleet of EC2 instances, then it’s easy to get started with Auto Scaling in just a few clicks. After your first Auto Scaling group is working to safeguard your existing fleet, you can follow the suggested learning path in this post. Over time, you can explore more features of Auto Scaling and further automate your software deployments and application scaling.

If you have questions or suggestions, please comment below.

Amazon ECS Service Auto Scaling Enables Rent-A-Center SAP Hybris Solution

by Chris Barclay | on 13 OCT 2016 | in Amazon ECS | Permalink | Comments

This is a guest post from Troy Washburn, Sr. DevOps Manager @ Rent-A-Center, Inc., and Ashay Chitnis, Flux7 architect.

—–

Rent-A-Center in their own words: Rent-A-Center owns and operates more than 3,000 rent-to-own retail stores for name-brand furniture, electronics, appliances and computers across the US, Canada, and Puerto Rico.

Rent-A-Center (RAC) wanted to roll out an ecommerce platform that would support the entire online shopping workflow using SAP’s Hybris platform. The goal was to implement a cloud-based solution with a cluster of Hybris servers which would cater to online web-based demand.

The challenge: to run the Hybris clusters in a microservices architecture. A microservices approach has several advantages including the ability for each service to scale up and down to meet fluctuating changes in demand independently. RAC also wanted to use Docker containers to package the application in a format that is easily portable and immutable. There were four types of containers necessary for the architecture. Each corresponded to a particular service:

1. Apache: Received requests from the external Elastic Load Balancing load balancer. Apache was used to set certain rewrite and proxy http rules.
2. Hybris: An external Tomcat was the frontend for the Hybris platform.
3. Solr Master: A product indexing service for quick lookup.
4. Solr Slave: Replication of master cache to directly serve product searches from Hybris.

To deploy the containers in a microservices architecture, RAC and AWS consultants at Flux7 started by launching Amazon ECS resources with AWS CloudFormation templates. Running containers on ECS requires the use of three primary resources: clusters, services, and task definitions. Each container refers to its task definition for the container properties, such as CPU and memory. And, each of the above services stored its container images in Amazon ECR repositories.

This post describes the architecture that we created and implemented.

Auto Scaling

At first glance, scaling on ECS can seem confusing. But the Flux7 philosophy is that complex systems only work when they are a combination of well-designed simple systems that break the problem down into smaller pieces. The key insight that helped us design our solution was understanding that there are two very different scaling operations happening. The first is the scaling up of individual tasks in each service and the second is the scaling up of the cluster of Amazon EC2 instances.

During implementation, Service Auto Scaling was released by the AWS team and so we researched how to implement task scaling into the existing solution. As we were implementing the solution through AWS CloudFormation, task scaling needed to be done the same way. However, the new scaling feature was not available for implementation through CloudFormation and so the natural course was to implement it using AWS Lambda–backed custom resources.

A corresponding Lambda function is implemented in Node.js 4.3, while automatic scaling happens by monitoring the CPUUtilization Amazon CloudWatch metric. The ECS policies below are registered with CloudWatch alarms that are triggered when specific thresholds are crossed. Similarly, by using the MemoryUtilization CloudWatch metric, ECS scaling can be made to scale in and out as well.

The Lambda function and CloudFormation custom resource JSON are available in the Flux7 GitHub repository: https://github.com/Flux7Labs/blog-code-samples/tree/master/2016-10-ecs-enables-rac-sap-hybris

Scaling ECS services and EC2 instances automatically

The key to understanding cluster scaling is to start by understanding the problem. We are no longer running a homogeneous workload in a simple environment. We have a cluster hosting a heterogeneous workload with different requirements and different demands on the system.

This clicked for us after we phrased the problem as, “Make sure the cluster has enough capacity to launch ‘x’ more instances of a task.” This led us to realize that we were no longer looking at an overall average resource utilization problem, but rather a discrete bin packing problem.

The problem is inherently more complex. (Anyone remember from algorithms class how the discrete Knapsack problem is NP-hard, but the continuous knapsack problem can easily be solved in polynomial time? Same thing.) So we have to check on each individual instance if a particular task can be scheduled on it, and if for any task we don’t cross the required capacity threshold, then we need to allocate more instance capacity.

To ensure that ECS scaling always has enough resources to scale out and has just enough resources after scaling in, it was necessary that the Auto Scaling group scales according to three criteria:

1. ECS task count in relation to the host EC2 instance count in a cluster
2. Memory reservation
3. CPU reservation

We implemented the first criteria for the Auto Scaling group. Instead of using the default scaling abilities, we set group scaling in and out using Lambda functions that were triggered periodically by a combination of the AWS::Lambda::Permission and an AWS::Events::Rule resources, as we wanted specific criteria for scaling.

The Lambda function is available in the Flux7 GitHub repository: https://github.com/Flux7Labs/blog-code-samples/tree/master/2016-10-ecs-enables-rac-sap-hybris

Future versions of this piece of code will incorporate the other two criteria along with the ability to use CloudWatch alarms to trigger scaling.

Conclusion

Using advanced ECS features like Service Auto Scaling in conjunction with Lambda to meet RAC’s business requirements, RAC and Flux7 were able to Dockerize SAP Hybris in production for the first time ever.

Further, ECS and CloudFormation give users the ability to implement robust solutions while still providing the ability to roll back in case of failures. With ECS as a backbone technology, RAC has been able to deploy a Hybris setup with automatic scaling, self-healing, one-click deployment, CI/CD, and PCI compliance consistent with the company’s latest technology guidelines and meeting the requirements of their newly-formed culture of DevOps and extreme agility.

If you have any questions or suggestions, please comment below.

Orchestrating GPU-Accelerated Workloads on Amazon ECS

by Chris Barclay | on 11 OCT 2016 | in Amazon ECS | Permalink | Comments

My colleagues Brandon Chavis, Chad Schmutzer, and Pierre Steckmeyer sent a nice guest post that describes how to run GPU workloads on Amazon ECS.

—

It’s interesting to note that many workloads on Amazon ECS fit into three primary categories that have obvious synergy with containers:

PaaS
Batch workloads
Long-running services

While these are the most common workloads, we also see ECS used for a wide variety of applications. One new and interesting class of workload is GPU-accelerated workloads or, more specifically, workloads that need to leverage large amounts of GPUs across many nodes.

In this post, we take a look at how ECS enables GPU workloads. For example, at Amazon.com, the Amazon Personalization team runs significant machine learning workloads that leverage many GPUs on ECS.

Amazon ECS overview

ECS is a highly scalable, high performance service for running containers on AWS. ECS provides customers with a state management engine that offloads the heavy lifting of running your own orchestration software, while providing a number of integrations with other services across AWS. For example, you can assign IAM roles to individual tasks, use Auto Scaling to scale your containers in response to load across your services, and use the new Application Load Balancer to distribute load across your application, while automatically checking new tasks into the load balancer when Auto Scaling actions occur.

Today, customers run a wide variety of applications on Amazon ECS, and you can read about some of these use cases here:

Edmunds
Remind
Coursera

ECS and GPU workloads

When you log into Amazon.com, it’s important that you see recommendations for a product you might actually be interested in. The Amazon Personalization team generates personalized product recommendations for Amazon customers. In order to do this, they need to digest very large data sets containing information about the hundreds of millions of products (and just as many customers) that Amazon.com has.

The only way to handle this work in a reasonable amount of time is to ensure it is distributed across a very large number of machines. Amazon Personalization uses machine-learning software that leverages GPUs to train neural networks, but it is challenging to orchestrate this work across a very large number of GPU cores.

To overcome this challenge, Amazon Personalization uses ECS to manage a cluster of Amazon EC2 GPU instances. The team uses P2 instances, which include NVIDIA Tesla K80 GPUs. The cluster of P2 instances functions as a single pool of resources—aggregating CPU, memory, and GPUs—onto which machine learning work can be scheduled.

In order to run this work on an ECS cluster, a Docker image configured with NVIDIA CUDA drivers, which allow the container to communicate with the GPU hardware, is built and stored in Amazon EC2 Container Registry (Amazon ECR).

An ECS task definition is used to point to the container image in ECR and specify configuration for the container at runtime, such as how much CPU and memory each container should use, the command to run inside the container, if a data volume should be mounted in the container, where the source data set lives in Amazon S3, and so on.

After ECS is asked to run a Task, the ECS scheduler finds a suitable place to run the containers by identifying an instance in the cluster with available resources. As shown in the following architecture diagram, ECS can place containers into the cluster of EC2 GPU instances (“GPU slaves” in the diagram):

Give GPUs on ECS a try

To make it easier to try using GPUs on ECS, we’ve built an AWS CloudFormation template to alleviate much of the heavy lifting. This demo architecture is built around DSSTNE, the open source, machine learning library that the Amazon Personalization team uses to actually generate recommendations. Go to the GitHub repository to see the CloudFormation template.

The template spins up an ECS cluster with a single EC2 GPU instance in an Auto Scaling group. You can adjust the desired group capacity to run a larger cluster, if you’d like.

The instance is configured with all of the necessary software that DSSTNE requires for interaction with the underlying GPU hardware, such as NVIDIA drivers. The template also installs some development tools and libraries, like GCC, HDF5, and Open MPI so that you can compile the DSSTNE library at boot time. It then builds a Docker container with the DSSTNE library packaged up and uploads it to ECR. It copies the URL of the resulting container image in ECR and builds an ECS task definition that points to the container.

After the CloudFormation template completes, view the Outputs tab to get an idea of where to look for your new resources.

Conclusion

In this post, we explained how you can use ECS on high GPU workloads, and shared the CloudFormation template that makes it easy to get started with ECS and DSSTNE.

Unfortunately, it would take far too much page space to explain the details of the machine learning specifics in this post, but you can read the Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE post on the AWS Big Data Blog to learn more about how DSSTNE interacts with Apache Spark, trains models, generates predictions, and other fun machine learning concepts.

If you have questions or suggestions, please comment below.

Powering Mobile Backend Services with AWS Lambda and Amazon API Gateway

by Bryan Liston | on 10 OCT 2016 | in AWS Lambda | Permalink | Comments

Daniel Austin
Solutions Architect

Asif Khan
Solutions Architect

Have you ever wanted to create a mobile REST API quickly and easily to make database calls and manipulate data sources? The Node.js and Amazon DynamoDB tutorial shows how to perform CRUD operations (Create, Read, Update, and Delete) easily on DynamoDB tables using Node.js.

In this post, I extend that concept by adding a REST API that calls an AWS Lambda function from Amazon API Gateway. The API allows you to perform the same operations on DynamoDB, from any HTTP-enabled device, such as a browser or mobile phone. The client device doesn't need to load any libraries, and with serverless architecture and API Gateway, you don't need to maintain any servers at all!

Walkthrough

In this post, I show you how to write a Lambda function so that it can handle all of the API calls in a single code function, and then add a RESTful API on top of it. API Gateway tells you what function was called.

The problem to solve: how to use API Gateway, AWS Lambda, and DynamoDB to simplify DynamoDB access? Our approach involves using a single Lambda function to provide a CRUD façade on DynamoDB. This required solving two additional problems:

Sending the date from API Gateway about which API method was called, along with POST information about the DynamoDB operations. This is solved by using a generic mapping template for each API call, sending all HTTP data to the Lambda function in a standard JSON format, including the path of the API call, i.e., '/movies/add-movie'.
Providing a generic means in Node.js to use multiple function calls and properly use callbacks to send the function results back to the API, again in a standard JSON format. This required writing a generic callback mechanism (a very simple one) that is invoked by each function, and gathers the data for the response.

This is a very cool and easy way to implement basic DynamoDB functions to HTTP(S) calls from API Gateway. It works from any browser or mobile device that understands HTTP.

Mobile developers can write backend code in Java, Node.js, or Python and deploy on Lambda.

In this post, I continue the demonstration with a sample mobile movies database backend, written in Node.js, using DynamoDB. The API is hosted on API Gateway.

Optionally, you can use AWS Mobile Hub to develop and test the mobile client app.

The steps to deploy a mobile backend in Lambda are:

Set up IAM users and roles to allow access to Lambda and DynamoDB.
Download the sample application and edit to include your configuration.
Create a table in DynamoDB using the console or the AWS CLI.Create a new Lambda function and upload the sample app.
Create endpoints in API Gateway
Test the API and Lambda function.

Set up IAM roles to allow access to Lambda and DynamoDB

To set up your API, you need an IAM user and role that has permissions to access DynamoDB and Lambda.

In the IAM console, choose Roles , Create role. Choose AWS Lambda from the list of service roles, then choose AmazonDynamoDBFullAccess and attach another policy, AWSLambdaFullAccess. You need to add this role to an IAM user: you can create a new user for this role, or use an existing one.

Download the sample application and edit to include your configuration

Now download the sample application and edit its configuration file.

The archive can be downloaded from GitHub: https://github.com/awslabs/aws-serverless-crud-sample

git clone https://github.com/awslabs/aws-serverless-crud-sample

After you download the archive, unzip it to an easily-found location and look for the file app_config.json. This file contains set up information for your Lambda function. Edit the file to include your access key ID and secret access key. If you created a new user in step 1, use those credentials. You also need to add your AWS region to the file – this is the region where you will create the DynamoDB table.

Create a table in DynamoDB using the console or the AWS CLI.

To create a table in DynamoDB, use the instructions in the Node.js and DynamoDB tutorial, in the Amazon DynamoDB Getting Started Guide. Next, run the file createMoviesTable.js from the downloaded code in the previous step. You could also use the AWS CLI with this input:

aws DynamoDB create-table  --cli-input-json file://movies-table.json  --region us-west-2  --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5

The file movies-table.json is in the code archive linked below. If you use the CLI, then the user must have sufficient permissions.

IMPORTANT : The table must be created before completing the rest of the walkthrough.

Create a new Lambda function and upload the sample app.

It can be a little tricky creating the archive for the Lambda function. Make sure you are not zipping the folder, but its contents. This is important; it should look like the following:

This creates a file called "Archive.zip", which is the file to be uploaded to Lambda.

In the Lambda console, choose Create a Lambda function and skip the Blueprints and Configure triggers sections. In the Configure function section, for Name , enter 'movies-db'. For Runtime , choose 'Node.js4.3'. For Code entry type, choose 'Upload a zip file'. Choose Upload and select the archive file that you created in the previous step. For Handler , choose 'movies-dynamodb.handler', which is the name of the JavaScript function inside the archive that will be called via Lambda from API Gateway. For Role , choose Choose an existing role and select the role that you created in the first step.

You can leave the other options unchanged and then review and create your Lambda function. You can test the function using the following bit of JSON (this mimics the data that will be sent to the Lambda function from API Gateway):

{
  "method": "POST",
  "body" : { "title": "Godzilla vs. Dynamo", "year": "2016", "info": "New from Acme Films, starring John Smith."},
  "headers": {
      },
  "queryParams": {
      },
  "pathParams": {
      },
  "resourcePath": "/add-movie"
}

Create endpoints in API Gateway

Now, you create five API methods in API Gateway. First, navigate to the API Gateway console and choose Create API , New API. Give the API a name, such as 'MoviesDP-API'.

In API Gateway, you first create resources and then the methods associated with those resources. The steps for each API call are basically the same:

Create the resource (/movies or /movies/add-movie).
Create a method for the resource – GET for /movies, POST for all others
Choose Integration request , Lambda and select the node-movies Lambda function created earlier. All the API calls use the same Lambda function.
Under Integration request , choose Body mapping templates , create a new template with type application/json , and copy the template file shown below into the form input.
Choose Save.

Use these steps for the following API resources and methods:

/movies – lists all movies in the DynamoDB table
/movies/add-movie – add an item to DynamoDB
/movies/delete-movie – deletes a movie from DynamoDB
/movies/findbytitleandyear – finds a movie with a specific title and year
/movies/update-movie – modifies and existing movie item in DynamoDB

This body mapping template is a standard JSON template for passing information from API Gateway to Lambda. It provides the calling Lambda function with all of the HTTP input data – including any path variables, query strings, and most importantly for this purpose, the resourcePath variable, which contains the information from API Gateway about which function was called.

Here's the template:

{
  "method": "$context.httpMethod",
  "body" : $input.json('$'),
  "headers": {
    #foreach($param in $input.params().header.keySet())
    "$param": "$util.escapeJavaScript($input.params().header.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "queryParams": {
    #foreach($param in $input.params().querystring.keySet())
    "$param": "$util.escapeJavaScript($input.params().querystring.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "pathParams": {
    #foreach($param in $input.params().path.keySet())
    "$param": "$util.escapeJavaScript($input.params().path.get($param))" #if($foreach.hasNext),#end
    #end
  },
  "resourcePath": "$context.resourcePath"
}

Notice the last line where the API Gateway variable $context.resourcePath is sent to the Lambda function as the JSON value of a field called (appropriately enough) resourcePath. This value is used by the Lambda function to perform the required action on the DynamoDB table.

(I originally found this template online, and modified it to add variables like resourcePath. Thanks to Kenn Brodhagen!)

As you create each API method, copy the requestTemplate.vel file from the original code archive and paste it into the Body mapping template field, using application/json as the type. Do this for each API call (using the same file). The file is the same one shown above, but it's easier to copy the file than cut and paste from a web page!

After the five API calls are created, you are almost done. You need to test your API, and then deploy it before it can be used from the public web.

Test your API and Lambda function

To test the API, add a movie to the DB. The API Gateway calls expect a JSON payload that is sent via POST to your Lambda function. Here's an example, it's the same one used above to test the Lambda function:

{ "title": "Love and Friendship", "year": "2016", "info": "New from Amazon Studios, starring Kate Beckinsale."}

To add this movie to your DB, test the /movies/add-movie method, as shown here:

Check logs on the test page of your Lambda function and in API Gateway

[picture10.png]

Conclusion

In this post, I demonstrated a quick way to get started on AWS Lambda for your mobile backend. You uploaded a movie database app which performed CRUD operations on a movies DB table in DynamoDB. The API was hosted on API Gateway for scale and manageability. With the above deployment, you can focus on building your API and mobile clients with serverless architecture. You can reuse the mapping template in future API Gateway projects.

Interested in reading more? See Access Resources in a VPC from Your Lambda Functions and

AWS Mobile Hub – Build, Test, and Monitor Mobile Applications.

If you have questions or suggestions, please comment below.

← Older posts

Oct	NOV	Dec
	11
2015	2016	2017