AWS Compute Blog
-
Amazon EC2 Container Service at AWS re:Invent

AWS re:Invent is just a few days away and the Amazon ECS team will be there. To talk to us about how you are using Amazon ECS or to find out more about how you can use Amazon ECS for your applications, drop by the Compute booth or the developer lounge.
There are also a number of Amazon ECS related talks next week. Come hear from AWS customers and the ECS team about how you can use Amazon ECS in production today.
In the Compute track
CMP302 ā Amazon EC2 Container Service: Distributed Applications at Scale (also being live streamed)
CMP406 ā Amazon ECS at Coursera: Powering a general-purpose near-line execution microservice, while defending against untrusted code (by Coursera)In the Devops track
DVO305 ā Turbocharge Your Continuous Deployment Pipeline with Containers
DVO308 ā Docker & ECS in Production: How We Migrated Our Infrastructure from Heroku to AWS (by Remind)
DVO313 ā Building Next-Generation Applications with Amazon ECS (by Meteor)
DVO317 ā From Local Docker Development to Production Deployments (by Docker)You can also drop by to watch a lightning talk on Amazon ECS and continuous delivery.
We look forward to seeing you in Las Vegas next week.
ā The Amazon ECS Team
-
Dynamic Scaling with EC2 Spot Fleet
Tipu Qureshi, AWS Senior Cloud Support EngineerThe RequestSpotFleet API allows you to launch and manage an entire fleet of EC2 Spot Instances with one request. A fleet is a collection of Spot Instances that are all working together as part of a distributed application and providing cost savings. With the ModifySpotFleetRequest API, itās possible to dynamically scale a Spot fleetās target capacity according to changing capacity requirements over time. Letās look at a batch processing application that is utilizing Spot fleet and Amazon SQS as an example. As discussed in our previous blog post on Additional CloudWatch Metrics for Amazon SQS and Amazon SNS, you can scale up when the ApproximateNumberOfMessagesVisible SQS metric starts to grow too large for one of your SQS queues, and scale down once it returns to a more normal value.
There are multiple ways to accomplish this dynamic scaling. As an example, a script can be scheduled (e.g. via cron) to get the value of the ApproximateNumberOfMessagesVisible SQS metric periodically and then scale the Spot fleet according to defined thresholds. The current size of the Spot fleet can be obtained using the DescribeSpotFleetRequests API and the scaling can be carried out by using the new ModifySpotFleetRequest API. A sample script written for NodeJS is available here, and following is a sample IAM policy for an IAM role that could be used on an EC2 instance for running the script:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1441252157702", "Action": [ "ec2:DescribeSpotFleetRequests", "ec2:ModifySpotFleetRequest", "cloudwatch:GetMetricStatistics" ], "Effect": "Allow", "Resource": "*" } ] }/By leveraging the IAM role on an EC2 instance, the script uses the AWS API methods described above to scale the Spot fleet dynamically. You can configure variables such as the Spot fleet request, SQS queue name, SQS metric thresholds and instance thresholds according to your applicationās needs. In the example configuration below we have set the minimum number of instances threshold (minCount) at 2 to ensure that the instance count for the spot fleet never goes below 2. This is to ensure that a new job is still processed immediately after an extended period with no batch jobs.
// Sample script for Dynamically scaling Spot Fleet // define configuration var config = { spotFleetRequest:'sfr-c8205d41-254b-4fa9-9843-be06585e5cda', //Spot Fleet Request Id queueName:'demojobqueue', //SQS queuename maxCount:100, //maximum number of instances minCount:2, //minimum number of instances stepCount:5, //increment of instances scaleUpThreshold:20, //CW metric threshold at which to scale up scaleDownThreshold:10, //CW metric threshold at which to scale down period:900, //period in seconds for CW region:'us-east-1' //AWS region }; // dependencies var AWS = require('aws-sdk'); var ec2 = new AWS.EC2({region: config.region, maxRetries: 5}); var cloudwatch = new AWS.CloudWatch({region: config.region, maxRetries: 5}); console.log ('Loading function'); main(); //main function function main() { //main function var now = new Date(); var startTime = new Date(now - (config.period * 1000)); console.log ('Timestamp: '+now); var cloudWatchParams = { StartTime: startTime, EndTime: now, MetricName: 'ApproximateNumberOfMessagesVisible', Namespace: 'AWS/SQS', Period: config.period, Statistics: ['Average'], Dimensions: [ { Name: 'QueueName', Value: config.queueName, }, ], Unit: 'Count' }; cloudwatch.getMetricStatistics(cloudWatchParams, function(err, data) { if (err) console.log(err, err.stack); // an error occurred else { //set Metric Variable var metricValue = data.Datapoints[0].Average; console.log ('Cloudwatch Metric Value is: '+ metricValue); var up = 1; var down = -1; // check if scaling is required if (metricValue = config.scaleDownThreshold) console.log ("metric not breached for scaling action"); else if (metricValue >= config.scaleUpThreshold) scale(up); //scaleup else scale(down); //scaledown } }); }; //defining scaling function function scale (direction) { //adjust stepCount depending upon whether we are scaling up or down config.stepCount = Math.abs(config.stepCount) * direction; //describe Spot Fleet Request Capacity console.log ('attempting to adjust capacity by: '+ config.stepCount); var describeParams = { DryRun: false, SpotFleetRequestIds: [ config.spotFleetRequest ] }; //get current fleet capacity ec2.describeSpotFleetRequests(describeParams, function(err, data) { if (err) { console.log('Unable to describeSpotFleetRequests: ' + err); // an error occurred return 'Unable to describeSpotFleetRequests'; } //set current capacity variable var currentCapacity = data.SpotFleetRequestConfigs[0].SpotFleetRequestConfig.TargetCapacity; console.log ('current capacity is: ' + currentCapacity); //set desired capacity variable var desiredCapacity = currentCapacity + config.stepCount; console.log ('desired capacity is: '+ desiredCapacity); //find out if the spot fleet is already modifying var fleetModifyState = data.SpotFleetRequestConfigs[0].SpotFleetRequestState; console.log ('current state of the the spot fleet is: ' + fleetModifyState); //only proceed forward if maxCount or minCount hasn't been reached //or spot fleet isn't being currently modified. if (fleetModifyState == 'modifying') console.log ('capacity already at min, max or fleet is currently being modified'); else if (desiredCapacity config.maxCount) console.log ('capacity already at max count'); else { console.log ('scaling'); var modifyParams = { SpotFleetRequestId: config.spotFleetRequest, TargetCapacity: desiredCapacity }; ec2.modifySpotFleetRequest(modifyParams, function(err, data) { if (err) { console.log('unable to modify spot fleet due to: ' + err); } else { console.log('successfully modified capacity to: ' + desiredCapacity); return 'success'; } }); } }); }You can modify this sample script to meet your applicationās requirements.
You could also leverage AWS Lambda for dynamically scaling your Spot fleet. As depicted in the diagram below, an AWS Lambda function can be scheduled (e.g using AWS datapipeline, cron or any form of scheduling) to get the ApproximateNumberOfMessagesVisible SQS metric for the SQS queue in a batch processing application. This Lambda function will check the current size of a Spot fleet using the DescribeSpotFleetRequests API, and then scale the Spot fleet using the ModifySpotFleetRequest API after also checking certain constraints such as the state or size of the Spot fleet similar to the script discussed above.

You could also use the sample IAM policy provided above to create an IAM role for the AWS Lambda function. A sample Lambda deployment package for dynamically scaling a Spot fleet based on the value of the ApproximateNumberOfMessagesVisible SQS metric can be found here. However, you could modify it to use any CloudWatch metric based on your use case. The sample script and Lambda function provided are only for reference and should be tested before using in a production environment.
-
AWS Lambda sessions at re:Invent 2015
Ajay Nair, Sr. Product Manager, AWS Lambda
If you will be attending re:Invent 2015 in Las Vegas next week, you will have many opportunities to learn more about building applications using AWS Lambda. The Lambda team will be presenting multiple sessions covering new features, as well as deep dives on using Lambda for data processing and backend workloads (Click any of the following links to learn more about a breakout session)- CMP301 ā AWS Lambda and the Serverless Cloud
- ARC308 ā The Serverless Company Using AWS Lambda: Streamlining Architecture with AWS (featuring PlayOn! Sports)
- MBL302 ā Building Scalable, Serverless Mobile and Internet of Things Back Ends (featuring EasyTen)
- BDT307 ā Zero Infrastructure, Real-Time Data Collection, and Analytics (featuring Zillow)
- DEV203 ā Using Amazon API Gateway with AWS Lambda to Build Secure and Scalable APIs
- GAM401 ā Build a Serverless Mobile Game with Amazon Cognito, Lambda, and DynamoDB
We will also be joined by customers presenting their own experiences with using Lambda in production systems.- ARC201 ā Microservices Architecture for Digital Platforms with AWS Lambda, Amazon CloudFront and Amazon DynamoDB ā Eugene Istrati ā CTO, Mitoc Group Inc
- CMP403 ā AWS Lambda: Simplifying Big Data Workloads ā Martin Holste ā Architect, FireEye, Inc.
- CMP407 ā Lambda as Cron: Scheduling Invocations in AWS Lambda ā Guy Davies ā Senior Systems Engineer, Sophos Ltd
- DVO209 ā JAWS: The Monstrously Scalable Serverless Framework ā AWS Lambda, Amazon API Gateway, and More! ā Austen Collins ā Founder, JAWS, Ryan Pendergast ā Developer, DoApp, Inc
Want to go hands on? Sign up for a workshop!- WRK202 ā Rapid Mobile App Development on AWS
- WRK302 ā Event-Driven Programming
- WRK305 ā Zombie Apocalypse Survival: Building Serverless Microservices (featuring Zombies!)
- WRK308 ā AWS + ASK: Teaching Amazon Echo New Skills
Want to chat with the experts and get your questions answered? Stop by at the Compute Booth and New Services booth on the display floor, or stop by at the developer lounge session below:
Didnāt register before the conference sold out? You can still sign up below to watch Live Streams of the keynotes and select breakout sessions. All sessions will be recorded and made available on YouTube after the conference. Also, all slide decks from the sessions will be made available on SlideShare.net after the conference.
See you in Vegas! -
How to create a custom scheduler for Amazon ECS
My colleague Daniele Stroppa sent a nice guest post that shows how to create a custom scheduler for Amazon ECS.
ā
Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS takes care of two key functions required when running modern distributed applications: reliable state management and flexible scheduling. As Werner explained in a recent blog, ECS exposes the cluster state through a set of simple APIs that give you the details about all the instances in your cluster and all the containers running on those instances. In this post, we explain how to make use of the ECS API to create a custom scheduler.
Scheduling
A scheduler understands the needs and requirements of the systemāe.g., a container that needs 200 MB RAM and port 80āand tries to efficiently satisfy them. The scheduler then submits a request to the cluster state manager to acquire the required resource.
ECS provides optimistic concurrency control so multiple schedulers can be operating at the same time; the cluster manager can confirm that the resource is available and commit it to the scheduler. The scheduler can listen for events from the cluster manager and take action, such as maintaining the availability of your applications, or interact with other resources like Elastic Load Balancing load balancers.
ECS currently offers two schedulers to find the optimal instance placement based on your resource needs and availability requirements: a task scheduler and a service scheduler. Some customers may find that they have requirements that are not satisfied by one of the current schedulers. For example, if a customer wanted to register tasks with Route 53 instead of Elastic Load Balancing, a custom scheduler could create an SRV record when a task is scheduled and remove the record when the task stops.
In this blog we will show an example custom scheduler that starts tasks on the instance with the least number of running tasks to illustrate the process of creating a custom scheduler.
Implementing a custom scheduler
A custom scheduler makes use of the ECS List* and Describe* API operations to determine the current state of the cluster. It then selects one (or more) container instances according to the logic implemented in the scheduler and uses StartTask to start a task on the selected container instance. For more details about API operations, see the Amazon ECS API Reference.
As an example, letās say you want to start tasks on the instance with the least number of running tasks. Hereās how to implement a custom scheduler that implements this logic. Start by getting a list of all container instances in a cluster:
def getInstanceArns(clusterName): containerInstancesArns = [] # Get instances in the cluster response = ecs.list_container_instances(cluster=clusterName) containerInstancesArns.extend(response['containerInstanceArns']) # If there are more instances, keep retrieving them while response.get('nextToken', None) is not None: response = ecs.list_container_instances( cluster=clusterName, nextToken=response['nextToken'] ) containerInstancesArns.extend(response['containerInstanceArns']) return containerInstancesArnsFor each instance in the cluster, you can then find out how many tasks are in the RUNNING state and start a task on the instance with the least number of running tasks:
def startTask(clusterName, taskDefinition): startOn = [] # Describe all instances in the ECS cluster containerInstancesArns = getInstanceArns(clusterName) response = ecs.describe_container_instances( cluster=clusterName, containerInstances=containerInstancesArns ) containerInstances = response['containerInstances'] # Sort instances by number of running tasks sortedContainerInstances = sorted( containerInstances, key=lambda containerInstances: containerInstances['runningTasksCount'] ) # Get the instance with the least number of tasks startOn.append(sortedContainerInstances[0]['containerInstanceArn']) logging.info('Starting task on instance %s...', startOn) # Start a new task response = ecs.start_task( cluster=clusterName, taskDefinition=taskDefinition, containerInstances=startOn, startedBy='LeastTasksScheduler' )After you string all the pieces together, the custom scheduler looks like this:
#!/usr/bin/env python import boto3 import argparse import logging # Set up logger logging.getLogger(__name__) logging.basicConfig(format='%(asctime)s - %(levelname)s: %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p', level=logging.INFO) # Set up ECS boto client ecs = boto3.client('ecs') def getInstanceArns(clusterName): containerInstancesArns = [] # Get instances in the cluster response = ecs.list_container_instances(cluster=clusterName) containerInstancesArns.extend(response['containerInstanceArns']) # If there are more instances, keep retrieving them while response.get('nextToken', None) is not None: response = ecs.list_container_instances( cluster=clusterName, nextToken=response['nextToken'] ) containerInstancesArns.extend(response['containerInstanceArns']) return containerInstancesArns def startTask(clusterName, taskDefinition): startOn = [] # Describe all instances in the ECS cluster containerInstancesArns = getInstanceArns(clusterName) response = ecs.describe_container_instances( cluster=clusterName, containerInstances=containerInstancesArns ) containerInstances = response['containerInstances'] # Sort instances by number of running tasks sortedContainerInstances = sorted( containerInstances, key=lambda containerInstances: containerInstances['runningTasksCount'] ) # Get the instance with the least number of tasks startOn.append(sortedContainerInstances[0]['containerInstanceArn']) logging.info('Starting task on instance %s...', startOn) # Start a new task response = ecs.start_task( cluster=clusterName, taskDefinition=taskDefinition, containerInstances=startOn, startedBy='LeastTasksScheduler' ) # # LeastTasks ECS Scheduler # if __name__ == "__main__": parser = argparse.ArgumentParser( description='ECS Custom Scheduler to start a task on the instance with the least number of running tasks.' ) parser.add_argument('-c', '--cluster', nargs='?', default='default', help='The short name or full Amazon Resource Name (ARN) of the cluster that you want to start your task on. If you do not specify a cluster, the default cluster is assumed.' ) parser.add_argument('-d', '--task-definition', required=True, help='The family and revision (family:revision) or full Amazon Resource Name (ARN) of the task definition that you want to start.' ) args = parser.parse_args() logging.info('Starting task %s on cluster %s...', args.task_definition, args.cluster) startTask(args.cluster, args.task_definition)Conclusion
This is a very simple example, but it gives an idea of how to use the powerful Amazon ECS API to create a custom scheduler.
A community member recently created ecs_state, which āis a small Go library that uses the ECS List and Describe API operations to store information about running tasks and available resources in memory in SQLite. There are a set of API operations that control when to refresh state, as well as an operation to search for machines with the resources available to accept the task. Further logic and filtering can then be applied in memory before finally calling StartTask or StopTask. This library can reduce the API actions your scheduler needs to perform and simplify finding resources using SQL.ā
We look forward to your feedback on enhancements you want for the existing ECS schedulers, and learning about any new schedulers built by the community.
-
A Simple Serverless Test Harness using AWS Lambda
Tim Wagner, AWS Lambda General Manager
You can easily test a Lambda function locally by using either a nodejs runner or JUnit for Java. But once your function is live in Lambda, how do you test it?
One option is to create an API for it using Amazon API Gateway and then employ one of the many HTTP-based test harnesses out there. While that certainly works, in this article weāll look at using Lambda itself as a simple (and serverless!) test platform. Weāll look at testing two categories ā other Lambda functions and HTTPS endpoints ā but you can apply these techniques to testing anything you want.
Unit Testing
In the Microservices without the Servers blog post we looked briefly at testing the image processing service we built using Lambda itself. The idea was simple: We create a āunitā test that calls another Lambda function (the functionality being tested) and records its output in DynamoDB. We can record the actual response of calling the function under test or just summarize whether it succeeded or failed. Other information, such as performance data (running time, memory consumed, etc.) can of course be added.
AWS Lambda includes a unit and load test harness blueprint, making it easy to create a custom harness. By default, the blueprint runs the function under test and records the response in a predetermined DynamoDB table. You can easily customize how results are recorded (e.g., sending them to Amazon SQS, Amazon Kinesis, or Amazon S3 instead of DymamoDB) and what is recorded (success, actual results, performance or other environmental attributes of running the function, etc.)
Using this blueprint is easy: It reads a simple JSON format to tell it what to call:
{ "operation": "unit", "function": <Name of the Lambda function under test>, "resultsTable": "unit-test-results", "testId": "MyTestRun", "event": { /* The event you want the function under test to process goes here. */ } }One of the nice things about this approach is that you donāt have to worry about infrastructure at any point in the testing process. The economics of Lambda functions work to your advantage: after the test runs, youāre not left owning a bunch of resources, and you donāt have to spend any energy spinning up or shutting down infrastructure. When youāre done with the results in DynamoDB, you can simply delete the corresponding rows or replace them with a single, summary row. Until then, you have the full power of a managed NoSQL database to access and manipulate your test run results.
The Lambda blueprint itself is intentionally very simple. You can add an existing test library if you want more structure, or just instantiate the blueprint and modify it for each function under test.
HTTPS Endpoint Testing
Lambda also includes a blueprint for invocating HTTPS endpoints, such as those created by Amazon API Gateway. You can combine these two blueprints to create a unit tester for any HTTPS service, including APIs created with API Gateway. Hereās an example of the unit test JSON from our image service test:
{ "operation": "unit", "function": "HTTPSInvoker", "resultsTable": "unit-test-results", "testId": "LinuxConDemo", "event": { "options": { "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com", "path": "/prod/ImageProcessingService", "method": "POST" }, "data": { "operation": "getSample" } } }Load Testing
Lambdaās test harness blueprint can run in either of two modes: Unit and Load. Load testing enables you to use Lambdaās scalability to your advantage, running multiple unit tests asynchronously. As before, each unit test records its result in DynamoDB, so you can easily do some simple scale testing, all without standing up infrastructure. Simple queries in DynamoDB enable you to validate how many tests completed (and completed successfully). To do more elaborate post-hoc analysis, you can also hook up a Lambda function to DynamoDB as an event handler, allowing you to count results, perform additional validation, etc.
Hereās an example recreated from the image processing blog article that uses the unit and load test harness again, this time in āload testā mode:
{ "operation": "load", "iterations": 100, "function": "TestHarness", "event": { "operation": "unit", "function": "HTTPSInvoker", "resultsTable": "unit-test-results", "testId": "LinuxConLoadTestDemo", "event": { "options": { "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com", "path": "/prod/ImageProcessingService", "method": "POST" }, "data": { "operation": "getSample" } } } }From the outside in, this JSON file
- Instructs load tester to run 100 copies ofā¦
- The unit test, which runs the HTTPS invoker, recording the result in a DynamoDB table named āunit-test-resultsā
- The HTTPS invoker POSTs the āgetSampleā operation request to the API Gateway endpoint
- ā¦which ultimately calls the image processing microservice, implemented as a Lambda function.
Summary
In this article we took a deeper look at serverless testing and how Lambda itself can be used as a testing platform. We saw how the test harness blueprint can be used for both unit and load testing and how the approach can be customized to the particular needs of your functions and test regimen.
Until next time, happy Lambda coding (and serverless testing)!
-
Microservices without the Servers
Tim Wagner, AWS Lambda General Manager
At LinuxCon/ContainerCon 2015 I presented a demo-driven talk titled, āMicroservices without the Serversā. In it, I created an image processing microservice, deployed it to multiple regions, built a mobile app that used it as a backend, added an HTTPS-based API using Amazon API Gateway and a website, and then unit and load tested it, all without using any servers.
This blog recreates the talk in detail, stepping you through all the pieces necessary for each of these steps and going deeper into the architecture. For a high-level overview, check out the slides. For another example of this architecture, check out the executable gist repository, SquirrelBin.
Serverless Architecture
By āserverlessā, we mean no explicit infrastructure required, as in: no servers, no deployments onto servers, no installed software of any kind. Weāll use only managed cloud services and a laptop. The diagram below illustrates the high-level components and their connections: a Lambda function as the compute (ābackendā) and a mobile app that connects directly to it, plus Amazon API Gateway to provide an HTTP endpoint for a static Amazon S3-hosted website.
A Serverless Architecture for Mobile and Web Apps Using AWS Lambda
Now, letās start building!
Step 1: Create the Image Processing Service
To make this a little easier to follow along, weāre going to use a library that comes built in with Lambdaās nodejs language: ImageMagick. However, thatās not required ā if you prefer to use your own library instead, you can load JavaScript or native libraries, run Python, or even wrap wrap a command line executable. The examples below are implemented in nodejs, but you can also build this service using Java, Clojure, Scala, or other jvm-based languages in AWS Lambda.
The code below is a sort of āhello worldā program for ImageMagick ā it gives us a basic command structure (aka a switch statement) and enables us to retrieve the built-in rose image and return it. Apart from encoding the result so it can live happily in JSON, thereās not much to this.
var im = require("imagemagick"); var fs = require("fs"); exports.handler = function(event, context) { if (event.operation) console.log("Operation " + event.operation + " requested"); switch (event.operation) { case 'ping': context.succeed('pong'); return; case 'getSample': event.customArgs = ["rose:", "/tmp/rose.png"]; im.convert(event.customArgs, function(err, output) { if (err) context.fail(err); else { var resultImgBase64 = new Buffer(fs.readFileSync("/tmp/rose.png")).toString('base64'); try {fs.unlinkSync("/tmp/rose.png");} catch (e) {} // discard context.succeed(resultImgBase64); } }); break; // allow callback to complete default: var error = new Error('Unrecognized operation "' + event.operation + '"'); context.fail(error); return; } };First, letās make sure the service is running by sending it the following JSON in the AWS Lambda consoleās test window:
{ "operation": "ping" }You should get the requisite āpongā response. Next, weāll actually invoke ImageMagick by sending JSON that looks like this:
{ "operation": "getSample" }This request retrieves a base64-encoded string representing a PNG version of a picture of a rose: āāiVBORw0KGgā¦Jggg==ā. To make sure this isnāt just some random characters, copy-paste it (sans double quotes) into any convenient Base64-to-image decoder, such as codebeautify.org/base64-to-image-converter. You should see a nice picture of a rose:
Sample Image (red rose)
Now, letās complete the image processing service by exposing the rest of the nodejs wrapper around it. Weāre going to offer a few different operations:
- ping: Verify service is available.
- getDimensions: Shorthand for calling identify operation to retrieve width and height of an image.
- identify: Retrieve image metadata.
- resize: A convenience routine for resizing (which calls convert under the covers)
- thumbnail: A synonym for resize.
- convert: The ādo-everythingā routine ā can convert media formats, apply transforms, resize, etc.
- getSample: Retrieve a sample image; the āhello worldā operation.
Most of the code is extremely straightforward wrapping of the nodejs ImageMagick routines, some of which take JSON (in which case the event passed in to Lambda is cleaned up and forwarded along) and others of which take command line (aka ācustomā) arguments, which are passed in as a string array. The one part of this that might be non-obvious if you havenāt used ImageMagick before is that it works as a wrapper over the command line, and the names of files have semantic meaning. We have two competing needs: We want the client to convey the semantics (e.g., the output format of an image, such as PNG versus JPEG) but we want the service author to determine where to place the temporary storage on disk so we donāt leak implementation details. To accomplish both at once, we define two arguments in the JSON schema: āinputExtensionā and āoutputExtensionā, and then we build the actual file location by combining the clientās portion (file extension) with the serverās portion (directory and base name). You can see (and use!) the completed code in the image processing blueprint.
There are lots of tests you can run here (and weāll do more later), but as a quick sanity check, retrieve the sample rose image again and the pass it back in using a negation (color inversion) filter. You can use JSON like this in the Lambda console, just replace the base64Image field with the actual image characters (itās a little long to include here in the blog page).
{ "operation": "convert", "customArgs": [ "-negate" ], "outputExtension": "png", "base64Image": "...fill this in with the rose sample image, base64-encoded..." }The output, decoded as an image, should be that elusive botanical rarity, a blue rose:
Blue Rose (negative of red rose sample image)
So thatās all there is to the functional aspect of the service. Normally, this is where it would start to get ugly, going from āworked onceā to āscalable and reliable service with 24x7x365 monitoring and production loggingā. But thatās the beauty of Lambda: our image processing code is already a fully deployed, production strength microservice. Next, letās add a mobile app that can call itā¦
Step 2: Create a Mobile Client
Our image processing microservice can be accessed in a number of ways, but to demonstrate a sample client, weāll build a quick Android app. Below Iām showing the client-side code that we used in the ContainerCon talk to create a simple Android app that letās you pick an image and a filter and then displays the effect of applying the filter to the image by calling the āconvertā operation in the image processing service thatās now running in AWS Lambda.
To get a sense of what the app does, hereās one of its sample images, the AWS Lambda Icon:
Android Emulator Displaying the AWS Lambda Icon Image
Weāll pick the ānegateā filter to invert the colors in the icon:
Selecting the āNegateā Image Conversion Filter
..and hereās the result: A blue version of our (originally orange) Lambda moniker:
Result of Applying the āNegateā Filter to the AWS Lambda Icon
We could also give an old-world feel to the modern Seattle skyline by choosing the Seattle image and aplying a sepia-tone filter:
A Sepia-toned Seattle Skyline
Now on to the code. Iām not trying to teach basic Android programming here, so Iāll just focus on the Lambda-specific elements of this app. (If youāre creating your own, youāll also need to include the AWS Mobile SDK jar to run the sample code below.) Conceptually there are four parts:
- POJO Data Schema
- Remote Service (Operation) Definition
- Initialization
- Service Invocation
Weāll take a look at each one in turn.
The data schema defines any objects that need to be passed between client and server. There are no āLambda-ismsā here; these objects are just POJOs (Plain Old Java Objects) with no special libraries or frameworks. We define a base event and then extend it to reflect our operation structure ā you can think of this as the āJavaificationā of the JSON we used when defining and testing the image processing service above. If you were also writing the server in Java, youād typically share these files as part of the common event structure definition; in our example, these POJOs turn into JSON on the server side.
LambdaEvent.java
package com.amazon.lambda.androidimageprocessor.lambda; public class LambdaEvent { private String operation; public String getOperation() {return operation;} public void setOperation(String operation) {this.operation = operation;} public LambdaEvent(String operation) {setOperation(operation);} }ImageConvertRequest.java
package com.amazon.lambda.androidimageprocessor.lambda; import java.util.List; public class ImageConvertRequest extends LambdaEvent { private String base64Image; private String inputExtension; private String outputExtension; private List customArgs; public ImageConvertRequest() {super("convert");} public String getBase64Image() {return base64Image;} public void setBase64Image(String base64Image) {this.base64Image = base64Image;} public String getInputExtension() {return inputExtension;} public void setInputExtension(String inputExtension) {this.inputExtension = inputExtension;} public String getOutputExtension() {return outputExtension;} public void setOutputExtension(String outputExtension) {this.outputExtension = outputExtension;} public List getCustomArgs() {return customArgs;} public void setCustomArgs(List customArgs) {this.customArgs = customArgs;} }So far, not very complicated. Now that we have a data model, weāll define the service endpoint using some Java annotations. Weāre exposing two operations here, āpingā and āconvertā; it would be easy to extend this to include the others as well, but we donāt need them for the sample app below.
ILambdaInvoker.java
package com.amazon.lambda.androidimageprocessor.lambda; import com.amazonaws.mobileconnectors.lambdainvoker.LambdaFunction; import java.util.Map; public interface ILambdaInvoker { @LambdaFunction(functionName = "ImageProcessor") String ping(Map event); @LambdaFunction(functionName = "ImageProcessor") String convert(ImageConvertRequest request); }Now weāre ready to do the main part of the app. Much of this is boilerplate Android code or simple client-side resource management, but Iāll point out a couple of sections that are Lambda related:
This is the āinitā section; it creates the authentication provider to call the Lambda APIs and creates a Lambda invoker capable of calling the endpoints defined above and transmitting the POJOs in our data model:
// Create an instance of CognitoCachingCredentialsProvider CognitoCachingCredentialsProvider cognitoProvider = new CognitoCachingCredentialsProvider( this.getApplicationContext(), "us-east-1:<YOUR COGNITO IDENITY POOL GOES HERE>", Regions.US_EAST_1); // Create LambdaInvokerFactory, to be used to instantiate the Lambda proxy. LambdaInvokerFactory factory = new LambdaInvokerFactory(this.getApplicationContext(), Regions.US_EAST_1, cognitoProvider); // Create the Lambda proxy object with a default Json data binder. lambda = factory.build(ILambdaInvoker.class);The other code section thatās interesting (well, sort of) is the actual remote procedure call itself:
try { return lambda.convert(params[0]); } catch (LambdaFunctionException e) { Log.e("Tag", "Failed to convert image"); return null; }Itās actually not that interesting because the magic (argument serialization and result deserialization) is happening behind the scenes, leaving just some error handling to be done here.
Hereās the complete source file:
MainActivity.java
package com.amazon.lambda.androidimageprocessor; import android.app.Activity; import android.app.ProgressDialog; import android.graphics.Bitmap; import android.graphics.BitmapFactory; import android.os.AsyncTask; import android.os.Bundle; import android.util.Base64; import android.util.Log; import android.view.View; import android.widget.ImageView; import android.widget.Spinner; import android.widget.Toast; import com.amazon.lambda.androidimageprocessor.lambda.ILambdaInvoker; import com.amazon.lambda.androidimageprocessor.lambda.ImageConvertRequest; import com.amazonaws.auth.CognitoCachingCredentialsProvider; import com.amazonaws.mobileconnectors.lambdainvoker.LambdaFunctionException; import com.amazonaws.mobileconnectors.lambdainvoker.LambdaInvokerFactory; import com.amazonaws.regions.Regions; import java.io.ByteArrayOutputStream; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Objects; public class MainActivity extends Activity { private ILambdaInvoker lambda; private ImageView selectedImage; private String selectedImageBase64; private ProgressDialog progressDialog; @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.activity_main); // Create an instance of CognitoCachingCredentialsProvider CognitoCachingCredentialsProvider cognitoProvider = new CognitoCachingCredentialsProvider( this.getApplicationContext(), "us-east-1:2a40105a-b330-43cf-8d4e-b647d492e76e", Regions.US_EAST_1); // Create LambdaInvokerFactory, to be used to instantiate the Lambda proxy. LambdaInvokerFactory factory = new LambdaInvokerFactory(this.getApplicationContext(), Regions.US_EAST_1, cognitoProvider); // Create the Lambda proxy object with a default Json data binder. lambda = factory.build(ILambdaInvoker.class); // ping lambda function to make sure everything is working pingLambda(); } // ping the lambda function @SuppressWarnings("unchecked") private void pingLambda() { Map event = new HashMap(); event.put("operation", "ping"); // The Lambda function invocation results in a network call. // Make sure it is not called from the main thread. new AsyncTask<Map, Void, String>() { @Override protected String doInBackground(Map... params) { // invoke "ping" method. In case it fails, it will throw a // LambdaFunctionException. try { return lambda.ping(params[0]); } catch (LambdaFunctionException lfe) { Log.e("Tag", "Failed to invoke ping", lfe); return null; } } @Override protected void onPostExecute(String result) { if (result == null) { return; } // Display a quick message Toast.makeText(MainActivity.this, "Made contact with AWS lambda", Toast.LENGTH_LONG).show(); } }.execute(event); } // event handler for "process image" button public void processImage(View view) { // no image has been selected yet if (selectedImageBase64 == null) { Toast.makeText(this, "Please tap one of the images above", Toast.LENGTH_LONG).show(); return; } // get selected filter String filter = ((Spinner) findViewById(R.id.filter_picker)).getSelectedItem().toString(); // assemble new request ImageConvertRequest request = new ImageConvertRequest(); request.setBase64Image(selectedImageBase64); request.setInputExtension("png"); request.setOutputExtension("png"); // custom arguments per filter List customArgs = new ArrayList(); request.setCustomArgs(customArgs); switch (filter) { case "Sepia": customArgs.add("-sepia-tone"); customArgs.add("65%"); break; case "Black/White": customArgs.add("-colorspace"); customArgs.add("Gray"); break; case "Negate": customArgs.add("-negate"); break; case "Darken": customArgs.add("-fill"); customArgs.add("black"); customArgs.add("-colorize"); customArgs.add("50%"); break; case "Lighten": customArgs.add("-fill"); customArgs.add("white"); customArgs.add("-colorize"); customArgs.add("50%"); break; default: return; } // async request to lambda function new AsyncTask() { @Override protected String doInBackground(ImageConvertRequest... params) { try { return lambda.convert(params[0]); } catch (LambdaFunctionException e) { Log.e("Tag", "Failed to convert image"); return null; } } @Override protected void onPostExecute(String result) { // if no data was returned, there was a failure if (result == null || Objects.equals(result, "")) { hideLoadingDialog(); Toast.makeText(MainActivity.this, "Processing failed", Toast.LENGTH_LONG).show(); return; } // otherwise decode the base64 data and put it in the selected image view byte[] imageData = Base64.decode(result, Base64.DEFAULT); selectedImage.setImageBitmap(BitmapFactory.decodeByteArray(imageData, 0, imageData.length)); hideLoadingDialog(); } }.execute(request); showLoadingDialog(); } /* Select methods for each image */ public void selectLambdaImage(View view) { selectImage(R.drawable.lambda); selectedImage = (ImageView) findViewById(R.id.static_lambda); Toast.makeText(this, "Selected image 'lambda'", Toast.LENGTH_LONG).show(); } public void selectSeattleImage(View view) { selectImage(R.drawable.seattle); selectedImage = (ImageView) findViewById(R.id.static_seattle); Toast.makeText(this, "Selected image 'seattle'", Toast.LENGTH_LONG).show(); } public void selectSquirrelImage(View view) { selectImage(R.drawable.squirrel); selectedImage = (ImageView) findViewById(R.id.static_squirrel); Toast.makeText(this, "Selected image 'squirrel'", Toast.LENGTH_LONG).show(); } public void selectLinuxImage(View view) { selectImage(R.drawable.linux); selectedImage = (ImageView) findViewById(R.id.static_linux); Toast.makeText(this, "Selected image 'linux'", Toast.LENGTH_LONG).show(); } // extract the base64 encoded data of the drawable resource `id` private void selectImage(int id) { Bitmap bmp = BitmapFactory.decodeResource(getResources(), id); ByteArrayOutputStream stream = new ByteArrayOutputStream(); bmp.compress(Bitmap.CompressFormat.PNG, 100, stream); selectedImageBase64 = Base64.encodeToString(stream.toByteArray(), Base64.DEFAULT); } // reset images to their original state public void reset(View view) { ((ImageView) findViewById(R.id.static_lambda)).setImageDrawable(getResources().getDrawable(R.drawable.lambda, getTheme())); ((ImageView) findViewById(R.id.static_seattle)).setImageDrawable(getResources().getDrawable(R.drawable.seattle, getTheme())); ((ImageView) findViewById(R.id.static_squirrel)).setImageDrawable(getResources().getDrawable(R.drawable.squirrel, getTheme())); ((ImageView) findViewById(R.id.static_linux)).setImageDrawable(getResources().getDrawable(R.drawable.linux, getTheme())); Toast.makeText(this, "Please choose from one of these images", Toast.LENGTH_LONG).show(); } private void showLoadingDialog() { progressDialog = ProgressDialog.show(this, "Please wait...", "Processing image", true, false); } private void hideLoadingDialog() { progressDialog.dismiss(); } }Thatās it for the mobile app: a data model (aka Java class), a control model (aka a couple of methods), three statements to initialize things, and then a remote call with a try/catch block around itā¦easy stuff.
Multi-region Deployments
So far we havenāt said much about where this code runs. Lambda takes care of deploying your code within a region, but you have to decide in which region(s) youād like to run it. In my original demo, I built the function initially in the us-east-1 region, aka the Virginia data center. To make good on the claim in the abstract that weād build a global service, letās extend that to include eu-west-1 (Ireland) and ap-northeast-1 (Tokyo) so that mobile apps can connect from around the globe with low latency:
A Serverless Mechanism to Deploy Lambda Functions in Two Additional Regions
This one weāve already discussed in the blog: In the S3 Deployment post, I show how to use a Lambda function to deploy other Lambda functions stored as ZIP files in Amazon S3. In the ContainerCon talk we made this slightly fancier by also turning on S3 cross-region replication, so that we could upload the image processing service as a ZIP file to Ireland, have S3 automatically copy it to Tokyo, and then have both regions automatically deploy it to the associated Lambda services in those respective regions. Gotta love serverless solutions :).
A Serverless Web App, Part 1: API Endpoints
Now that we have a mobile app and a globally-deployed image processing service serving as its backend, letās turn our attention to creating a serverless web app for those folks who prefer a browser to a device. Weāll do this in two parts: First, weāll create an API endpoint for our image processing service. Then in the next section weāll add the actual website using Amazon S3.
One of the ways in which AWS Lambda makes it easy to turn code into services is by providing a web service front end ābuilt inā. However, this requires clients (like the mobile client we built in the last section) to sign requests with AWS-provided credentials. Thatās handled by the Amazon Cognito auth client in our Android app, but what if we wanted to provide public access to the image processing service via a website?
To accomplish this, weāll turn to another server, the Amazon API Gateway. This service lets you define an API without requiring any infrastructure ā the API is fully managed by AWS. Weāll use the API gateway to create a URL for the image processing service that provides access to a subset of its capabilities to anyone on the web. Amazon API Gateway offers a variety of ways to control access to APIs: API calls can be signed with AWS credentials, you can use OAuth tokens and simply forward the token headers for verification, you can use API keys (not recommended as a way to secure access), or make an API completely public, as weāll show here.
In addition to a variety of access models, the API Gateway has a lot of features that we wonāt get to explore in this post. Some are builtin (like anti-DDOS protection) and others, like caching, would enable us to further reduce latency and cost for repeated retrievals of a popular image. By inserting an layer of indirection between clients and (micro)services, API Gateway also makes it possible to evolve them independently through its versioning and staging features. For now, though, weāll focus on the basic task of exposing our image processing service as an API.
Ok, letās create our API. In the AWS Console, pick the API Gateway and then select āNew APIā, provide a name for the API and an optional description. In my example, I named this āImageAPIā.

Next, create a resource for your new API (I called this āImageProcessingServiceā) and then create a POST method in it. Select āLambda functionā as the integration type, and type in the name of the Lambda function youāre using as your image processing service. In the āMethod Requestā configuration, set the authorization type to ānoneā (aka, this will be a publicly accessible endpoint). Thatās pretty much it.

To test the integration, click the āTestā button:

then supply a test payload such as {āoperationā: āpingā}. You should get the expected āpongā result, indicating that youāve successfully linked your API to the your Lambda function.
Aside: Weāll get to more (and deeper) testing later, but one thing I sometimes find useful is to add a GET method at the top level resource in my API, bound to something simple, like the ping operation, to enable me to also quickly vet from any browser that my API is linked up to my Lambda function as expected. Not required for this demo (or in general), but you might find it useful as well.
For what comes next (S3 static content) we also need CORS enabled. Itās straightforward but there are several steps. The API Gateway team continues to make this easier, so instead of repeating the instructions here (and potentially having them get out of date quickly), Iāll point you to the documentation.
Click on the āDeploy this APIā button. With that, you should be all set for website creation!
A Serverless Web App, Part 2: Static Website Hosting in Amazon S3
This part is easy ā upload the following Javascript website code to your S3 bucket of choice:
var ENDPOINT = 'https://fuexvelc41.execute-api.us-east-1.amazonaws.com/prod/ImageProcessingService'; angular.module('app', ['ui.bootstrap']) .controller('MainController', ['$scope', '$http', function($scope, $http) { $scope.loading = false; $scope.image = { width: 100 }; $scope.ready = function() { $scope.loading = false; }; $scope.submit = function() { var fileCtrl = document.getElementById('image-file'); if (fileCtrl.files && fileCtrl.files[0]) { $scope.loading = true; var fr = new FileReader(); fr.onload = function(e) { $scope.image.base64Image = e.target.result.slice(e.target.result.indexOf(',') + 1); $scope.$apply(); document.getElementById('original-image').src = e.target.result; // Now resize! $http.post(ENDPOINT, angular.extend($scope.image, { operation: 'resize', outputExtension: fileCtrl.value.split('.').pop() })) .then(function(response) { document.getElementById('processed-image').src = "data:image/png;base64," + response.data; }) .catch(console.log) .finally($scope.ready); }; fr.readAsDataURL(fileCtrl.files[0]); } }; }]);And hereās the HTML source we used for the (very basic) website in the demo:
<!DOCTYPE html> <html lang="en"> <head> <title>Image Processing Service</title> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" type="text/css" href="https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/3.3.4/css/bootstrap.min.css"> <link rel="stylesheet" type="text/css" href="http://fonts.googleapis.com/css?family=Open+Sans:400,700"> <link rel="stylesheet" type="text/css" href="main.css"> </head> <body ng-app="app" ng-controller="MainController"> <div class="container"> <h1>Image Processing Service</h1> <div class="row"> <div class="col-md-4"> <form ng-submit="submit()"> <div class="form-group"> <label for="image-file">Image</label> <input id="image-file" type="file"> </div> <div class="form-group"> <label for="image-width">Width</label> <input id="image-width" class="form-control" type="number" ng-model="image.width" min="1" max="4096"> </div> <button type="submit" class="btn btn-primary"> <span class="glyphicon glyphicon-refresh" ng-if="loading"></span> Submit </button> </form> </div> <div class="col-md-8"> <accordion close-others="false"> <accordion-group heading="Original Image" is-open="true"> <img id="original-image" class="img-responsive"> </accordion-group> <accordion-group heading="Processed Image" is-open="true"> <img id="processed-image" class="img-responsive"> </accordion-group> </accordion> </div> </div> </div> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular.js/1.3.15/angular.min.js"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/0.13.3/ui-bootstrap.min.js"></script> <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/angular-ui-bootstrap/0.13.3/ui-bootstrap-tpls.min.js"></script> <script type="text/javascript" src="main.js"></script> </body> </html>Finally, hereās the CSS:
body { font-family: 'Open Sans', sans-serif; padding-bottom: 15px; } a { cursor: pointer; } /** LOADER **/ .glyphicon-refresh { -animation: spin .7s infinite linear; -webkit-animation: spin .7s infinite linear; } @keyframes spin { from { transform: rotate(0deg); } to { transform: rotate(360deg); } } @-webkit-keyframes spin { from { -webkit-transform: rotate(0deg); } to { -webkit-transform: rotate(360deg); } }ā¦then turn on static website content serving in S3:

The URL will depend on your S3 region and object names, e.g. āhttp://image-processing-service.s3-website-us-east-1.amazonaws.com/ā. Visit that URL in a browser and you should see your image website:

Unit and Load Testing
With API Gateway providing a classic URL-based interface to your Lambda microservice, you have a variety of options for testing. But letās stick to our serverless approach and do it entirely without infrastructure or even a client!
First, we want to make calls through the API. Thatās easy; we use Lambdaās HTTPS invocation blueprint to POST to the endpoint we got when we deployed with API Gateway:
{ "options": { "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com", "path": "/prod/ImageProcessingService", "method": "POST" }, "data": { "operation": "getSample" } }Now that we have that, letās wrap a unit test around it. Our unit test harness doesnāt do much; it just runs another Lambda function and pops the result into an Amazon DynamoDB table that we specify. Weāll use the unit and load test harness Lambda blueprint for this in its āunit testā mode:
{ "operation": "unit", "function": "HTTPSInvoker", "resultsTable": "unit-test-results", "testId": "LinuxConDemo", "event": { "options": { "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com", "path": "/prod/ImageProcessingService", "method": "POST" }, "data": { "operation": "getSample" } } }Finally, we āll do a simple load test by running the unit test multiple times. Weāll use the Lambda unit and load test harness again, this time in āload testā mode:
{ "operation": "load", "iterations": 100, "function": "TestHarness", "event": { "operation": "unit", "function": "HTTPSInvoker", "resultsTable": "unit-test-results", "testId": "LinuxConLoadTestDemo", "event": { "options": { "host": "fuexvelc41.execute-api.us-east-1.amazonaws.com", "path": "/prod/ImageProcessingService", "method": "POST" }, "data": { "operation": "getSample" } } } }Hereās a picture of our serverless testing architecture:
A Serverless Unit and Load Test Harness
You can easily vary this approach to incorporate validation, run a variety of unit tests, etc. If you donāt need the web app infrastructure, you can skip the API Gateway and HTTP invocation and simply run the image processing service directly in your unit test. If you want to summarize or analyze the test output, you can easily attach a Lambda function as an event handler to the DynamoDB table that holds the test results.
Summary
This was a longish post, but itās a complete package for building a real, scalable backend service and fronting it with both mobile clients and a website, all without the need for servers or other infrastructure in any part of the system: frontend, backend, API, deployment, or testing. Go serverless!
Until next time, happy Lambda (and serverless microservice) coding!
-
Everything Depends on Context or, The Fine Art of nodejs Coding in AWS Lambda
Tim Wagner, AWS Lambda General Manager
Quick, whatās wrong with the Lambda code sketch below?
exports.handler = function(event, context) { anyAsyncCall(args, function(err, result) { if (err) console.log('problem'); else /* do something with result */; }); context.succeed(); };If you said the placement of context.succeed, youāre correct ā it belongs inside the callback. In general, when you get this wrong your code exits prematurely, after the incorrectly placed context.succeed line, without allowing the callback to run. The same thing happens if you make calls in a loop, often leading to race conditions where some callbacks get ādroppedā; the lack of a barrier synchronization forces a too-early exit.
If you test outside of Lambda, these patterns work fine in nodejs, because the default node runtime waits for all tasks to complete before exiting. Context.succeed, context.done, and context.fail however, are more than just bookkeeping ā they cause the request to return after the current task completes, even if other tasks remain in the queue. Generally thatās not what you want if those tasks represent incomplete callbacks.
Placement Patterns
Fixing the code in the single callback case is trivial; the code above becomes
exports.handler = function(event, context) { anyAsyncCall(args, function(err, result) { if (err) console.log('problem'); else /* do something with result */; context.succeed(); }); };Dealing with a loop that has an unbounded number of async calls inside it takes more work; hereās one pattern (the asyncAll function) used in Lambdaās test harness blueprint to run a test a given number of iterations:
/** * Provides a simple framework for conducting various tests of your Lambda * functions. Make sure to include permissions for `lambda:InvokeFunction` * and `dynamodb:PutItem` in your execution role! */ var AWS = require('aws-sdk'); var doc = require('dynamodb-doc'); var lambda = new AWS.Lambda({ apiVersion: '2015-03-31' }); var dynamo = new doc.DynamoDB(); // Runs a given function X times var asyncAll = function(opts) { var i = -1; var next = function() { i++; if (i === opts.times) { opts.done(); return; } opts.fn(next, i); }; next(); }; /** * Will invoke the given function and write its result to the DynamoDB table * `event.resultsTable`. This table must have a hash key string of "testId" * and range key number of "iteration". Specify a unique `event.testId` to * differentiate each unit test run. */ var unit = function(event, context) { var lambdaParams = { FunctionName: event.function, Payload: JSON.stringify(event.event) }; lambda.invoke(lambdaParams, function(err, data) { if (err) { context.fail(err); } // Write result to Dynamo var dynamoParams = { TableName: event.resultsTable, Item: { testId: event.testId, iteration: event.iteration || 0, result: data.Payload, passed: !JSON.parse(data.Payload).hasOwnProperty('errorMessage') } }; dynamo.putItem(dynamoParams, context.done); }); }; /** * Will invoke the given function asynchronously `event.iterations` times. */ var load = function(event, context) { var payload = event.event; asyncAll({ times: event.iterations, fn: function(next, i) { payload.iteration = i; var lambdaParams = { FunctionName: event.function, InvocationType: 'Event', Payload: JSON.stringify(payload) }; lambda.invoke(lambdaParams, function(err, data) { next(); }); }, done: function() { context.succeed('Load test complete'); } }); }; var ops = { unit: unit, load: load }; /** * Pass the test type (currently either "unit" or "load") as `event.operation`, * the name of the Lambda function to test as `event.function`, and the event * to invoke this function with as `event.event`. * * See the individual test methods above for more information about each * test type. */ exports.handler = function(event, context) { if (ops.hasOwnProperty(event.operation)) { ops[event.operation](event, context); } else { context.fail('Unrecognized operation "' + event.operation + '"'); } };The approach above serializes the loop; there are many other approaches, and you can use async or other libraries to help.
Does this matter for Java or other jvm-based languages in AWS Lambda?
The specific issue discussed here ā the āside effectā of the placement of a call like context.success on outstanding callbacks ā is unique to nodejs. In other languages, such as Java, returning from the thread of control that represents the request ends the request, which is a little easier to reason about and generally matches developer expectations. Any other threads or processes running at the time that request returns get frozen until the next request (assuming the container gets reused; i.e., possibly never), so if you want them to wrap up, you would need to include explicit barrier synchronization before returning, just as you normally would for a server-side request implemented with multiple threads/processes.
In all languages, context also offers useful āenvironmentalā information (like the request id) and methods (like the amount of time remaining).
Why not just let nodejs exit, if its default behavior is fine?
That would require every request to ābootā the runtimeā¦potentially ok in a one-off functional test, but latency would suffer and it would keep high request rates from being cost effective. Check out the post on container reuse for more on this topic.
Can you make this easier?
You bet! We were trying hard to balance simplicity, control, and a self-imposed ādonāt hack nodeā rule when we launched back in November. Fortunately, newer versions of nodejs offer more control over āexitingā behavior, and weāre looking hard at how to make future releases of nodejs within Lambda offer easier to understand semantics without losing the latency and cost benefits of container reuse. Stay tuned!
Until next time, happy Lambda (and contextually successful nodejs) coding!
-
Better Together: Amazon ECS and AWS Lambda
My colleague Constantin Gonzalez sent a nice guest post that shows how to create container workers using Amazon ECS.
ā
Amazon EC2 Container Service (Amazon ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.
AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. Lambda starts running your code within milliseconds of an event such as an image upload, in-app activity, website click, or output from a connected device.
In this post, we show you how to combine the two services to mutually enhance their capabilities: See how you can get more out of Lambda by using it to start ECS tasks and how you can turn your ECS cluster into a dynamic fleet of container workers that react to any event supported by Lambda.
Example Setup: Ray-tracing high-quality images in the cloud
To illustrate this pattern, you build a simple architecture that generates high-quality, ray-traced images out of input files written in a popular, open-source raytracing language called POV-Ray, conveyed by POV and licensed under either POVās proprietary license (up to version 3.6) or AGPLv3 (version 3.7 onwards). Hereās an overview of the architecture:

To use this architecture, put your POV-Ray scene description file (a POV-Ray .POV file) and its rendering parameters (a POV-Ray .INI file), as well as any supporting other files (e.g., texture images), into a single .ZIP file and upload it to an Amazon S3 bucket. In this architecture, the bucket is configured with an S3 event notification which triggers a Lambda function as soon as the .ZIP file is uploaded.
In similar setups (such as transcoding images), Lambda alone would be sufficient to perform its job on the uploaded object within its allocated time frame (currently 60 seconds). But in this example, we want to support complex rendering jobs that usually take significantly longer.
Therefore, the Lambda function simply takes the event data it received from S3 and sends it as a message into an Amazon Simple Queue Service (SQS) queue. SQS is a fast, reliable, scalable, fully-managed message queuing service. The Lambda function then starts an ECS task that can fetch and process the message from SQS.
Your ECS task contains a simple shell script that reads messages from SQS, extracts the S3 bucket and key information, downloads and unpacks the .ZIP file from S3 and proceeds to starting POV-Ray with the downloaded scene description and data.
After POV-Ray has performed its rendering magic, the script takes the resulting .PNG picture and uploads it back to the same S3 bucket where the original scene description was downloaded from. Then it deletes the message from the queue to avoid duplicate message processing.
The script continues pulling, processing, and deleting messages from the SQS queue until it is fully drained, then it exits, thereby terminating its own container.
Simple and efficient event-driven computing
This architecture can help you:
- Extend the capabilities of Lambda to support any processing time, more programming languages, or other resource requirements, to take advantage of the flexibility of Docker containers.
- Extend the capabilities of ECS to allow event-driven execution of ECS tasks: Use any event type supported by Lambda to start new ECS tasks for processing events, run long batch jobs triggered by new data in S3, or any other event-driven mechanism that you want to implement as a Docker container.
- Get the best of both worlds by coupling the dynamic, event-driven Lambda model with the power of the Docker eco-system.
Step-by-Step
Sounds interesting? Then get started!
This is an advanced architecture example covering a number of AWS services like Lambda, ECS, S3, and SQS in depth as well as using multiple related IAM policies. The underlying resources are in your account and subject to their pricing.
To make it easier for you to follow, we have published all necessary code and scripts on GitHub in the awslabs/lambda-ecs-worker-pattern repository. You might find it even more helpful if you could become familiar with the mentioned services by working through the respective Getting Started documentation first.
Meet the following prerequisites:
- Pick an AWS region that support Lambda, ECS, and SQS.
- Set up the AWS CLI. For more information, see the Setting Up topic in the ECS documentation.
- Get familiar with running Lambda functions and how to trigger them from S3 event notifications. For more information, see Getting Started 2: Handling Amazon S3 Events Using the AWS Lambda Console (Node.js).
- Start a default ECS cluster. For more information, see Getting Started with Amazon ECS.
- Become familiar with Docker files, creating Docker images, and using DockerHub. For more information, see the Docker User Guide.
- Set up a DockerHub account for free or use a Docker repository of your own. This demo uses DockerHub to manage and store Docker images.
This post walks through the steps required for setup, then gives you a simple Python script that can perform all the steps for you.
Step 1: Set up an S3 bucket
Start by setting up an S3 bucket to hold both the POV-Ray input files and the resulting .PNG output pictures. Choose a bucket name and create it:$ aws s3 mb s3://
Step 2: Create an SQS queue
Use SQS to pass the S3 notification event data from Lambda to your ECS task. You can create a new SQS queue using the following command:$ aws sqs create-queue --queue-name ECSPOVRayWorkerQueue
Step 3: Create the Lambda function
The following function reads in a configuration file with the name of an SQS queue, an ECS task definition name, and a whitelist of accepted input file types (.ZIP, in this example).The config file uses JSON and looks like this (make sure to use your region):
$ cat ecs-worker-launcher/config.js { "queue": "https://<YOUR-REGION>.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue", "task": "ECSPOVRayWorkerTask", "s3_key_suffix_whitelist": [".zip"] }The SQS queue ARN (which looks like: https://eu-west-1.queue.amazonaws.com/<YOUR-AWS-ACCOUNT-ID>/ECSPOVRayWorkerQueue) is the output of the preceding command, in which you created your ECS queue. The ātaskā attribute references the name of an ECS task that you create in a future step.
The Lambda function checks the S3 object key given in the Lambda event against the file type whitelist; in case of a match, it sends a message to the configured SQS queue with the event data and starts the ECS task specified in the configuration file. Hereās the code:
// Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. // // Licensed under the Apache License, Version 2.0 (the "License"). // You may not use this file except in compliance with the License. // A copy of the License is located at // // http://aws.amazon.com/apache2.0/ // // or in the "license" file accompanying this file. // This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and limitations under the License. // This Lambda function forwards the given event data into an SQS queue, then starts an ECS task to // process that event. var fs = require('fs'); var async = require('async'); var aws = require('aws-sdk'); var sqs = new aws.SQS({apiVersion: '2012-11-05'}); var ecs = new aws.ECS({apiVersion: '2014-11-13'}); // Check if the given key suffix matches a suffix in the whitelist. Return true if it matches, false otherwise. exports.checkS3SuffixWhitelist = function(key, whitelist) { if(!whitelist){ return true; } if(typeof whitelist == 'string'){ return key.match(whitelist + '$') } if(Object.prototype.toString.call(whitelist) === '[object Array]') { for(var i = 0; i < whitelist.length; i++) { if(key.match(whitelist[i] + '$')) { return true; } } return false; } console.log( 'Unsupported whitelist type (' + Object.prototype.toString.call(whitelist) + ') for: ' + JSON.stringify(whitelist) ); return false; }; exports.handler = function(event, context) { console.log('Received event:'); console.log(JSON.stringify(event, null, ' ')); var config = JSON.parse(fs.readFileSync('config.json', 'utf8')); if(!config.hasOwnProperty('s3_key_suffix_whitelist')) { config.s3_key_suffix_whitelist = false; } console.log('Config: ' + JSON.stringify(config)); var key = event.Records[0].s3.object.key; if(!exports.checkS3SuffixWhitelist(key, config.s3_key_suffix_whitelist)) { context.fail('Suffix for key: ' + key + ' is not in the whitelist') } // We can now go on. Put the S3 URL into SQS and start an ECS task to process it. async.waterfall([ function (next) { var params = { MessageBody: JSON.stringify(event), QueueUrl: config.queue }; sqs.sendMessage(params, function (err, data) { if (err) { console.warn('Error while sending message: ' + err); } else { console.info('Message sent, ID: ' + data.MessageId); } next(err); }); }, function (next) { // Starts an ECS task to work through the feeds. var params = { taskDefinition: config.task, count: 1 }; ecs.runTask(params, function (err, data) { if (err) { console.warn('error: ', "Error while starting task: " + err); } else { console.info('Task ' + config.task + ' started: ' + JSON.stringify(data.tasks))} next(err); }); } ], function (err) { if (err) { context.fail('An error has occurred: ' + err); } else { context.succeed('Successfully processed Amazon S3 URL.'); } } ); };The Lambda function uses the Async.js library to make it easier to program the sequence of events to perform in an event-driven language like Node.js. You can install the library by typing npm install async from within the directory where the Lambda function and its configuration file are located.
To upload the function into Lambda, zip the Lambda function, its configuration file, and the node_modules directory with the Async.js library. In the Lambda console, upload the .ZIP file as described in the Node.js for S3 events tutorial.
For this function to perform its job, it needs an IAM role with a policy that allows access to SQS as well as the right to start tasks on ECS. It also should be able to publish log data to CloudWatch Logs. However, it does not need explicit access to S3, because only the ECS task needs to download the source file from and upload the resulting image to S3. Here is an example policy:
{ "Statement": [ { "Action": [ "logs:*", "lambda:invokeFunction", "sqs:SendMessage", "ecs:RunTask" ], "Effect": "Allow", "Resource": [ "arn:aws:logs:*:*:*", "arn:aws:lambda:*:*:*:*", "arn:aws:sqs:*:*:*", "arn:aws:ecs:*:*:*" ] } ], "Version": "2012-10-17" }For the sake of simplicity in this post, we used very broadly defined resource identifiers like āarn:aws:sqs:*:*:*ā, which cover all the resources of the given types. In a real-world scenario, we recommend that you make resource definitions as specific as possible by adding account IDs, queue names, and other resource ARN parameters.
Step 4: Configure S3 bucket notifications
Now you need to set up a bucket notification for your S3 bucket that triggers the Lambda function as soon as a new object is copied into the bucket.This is a two-step process:
1. Add permission for S3 to be able to call your Lambda function:
$ aws lambda add-permission \ --function-name ecs-pov-ray-worker\ --region \ --statement-id \ --action "lambda:InvokeFunction" \ --principal s3.amazonaws.com \ --source-arn arn:aws:s3::: \ --source-account \ --profile
2. Set up an S3 bucket notification configuration (note: use the ARN for your Lambda function):
$ aws s3api put-bucket-notification-configuration \ --bucket \ --notification-configuration \ '{"LambdaFunctionConfigurations": [{"Events": ["s3:ObjectCreated:*"], "Id": "ECSPOVRayWorker", "LambdaFunctionArn": "arn:aws:lambda:eu-west-1::function:ecs-worker-launcher"}]}'If you use these CLI commands, remember to substitute your particular name and Lambda function ARN.
Step 5: Create a Docker image
Docker images contain all of the software needed to run your application on Docker, out of Dockerfiles that describe the steps needed to create that image.In this step, craft a Dockerfile that installs the POV-Ray ray-tracing application from POV (see important licensing information at the top of this post) as well as a simple shell script that uses the AWS CLI to consume messages from SQS, download and unpack the input data from S3 into the local file system, run the ray-tracer on the input file, then upload the resulting image back to S3, and delete the message from the SQS queue.
Start with the shell script, called ecs-worker.sh:
#!/bin/bash # Copyright 2015 Amazon.com, Inc. or its affiliates. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"). # You may not use this file except in compliance with the License. # A copy of the License is located at # # http://aws.amazon.com/apache2.0/ # # or in the "license" file accompanying this file. # This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and limitations under the License. # # Simple POV-Ray worker shell script. # # Uses the AWS CLI utility to fetch a message from SQS, fetch a ZIP file from S3 that was specified in the message, # render its contents with POV-Ray, then upload the resulting .png file to the same S3 bucket. # region=${AWS_REGION} queue=${SQS_QUEUE_URL} # Fetch messages and render them until the queue is drained. while [ /bin/true ]; do # Fetch the next message and extract the S3 URL to fetch the POV-Ray source ZIP from. echo "Fetching messages from SQS queue: ${queue}..." result=$( \ aws sqs receive-message \ --queue-url ${queue} \ --region ${region} \ --wait-time-seconds 20 \ --query Messages[0].[Body,ReceiptHandle] \ | sed -e 's/^"\(.*\)"$/\1/'\ ) if [ -z "${result}" ]; then echo "No messages left in queue. Exiting." exit 0 else echo "Message: ${result}." receipt_handle=$(echo ${result} | sed -e 's/^.*"\([^"]*\)"\s*\]$/\1/') echo "Receipt handle: ${receipt_handle}." bucket=$(echo ${result} | sed -e 's/^.*arn:aws:s3:::\([^\\]*\)\\".*$/\1/') echo "Bucket: ${bucket}." key=$(echo ${result} | sed -e 's/^.*\\"key\\":\s*\\"\([^\\]*\)\\".*$/\1/') echo "Key: ${key}." base=${key%.*} ext=${key##*.} if [ \ -n "${result}" -a \ -n "${receipt_handle}" -a \ -n "${key}" -a \ -n "${base}" -a \ -n "${ext}" -a \ "${ext}" = "zip" \ ]; then mkdir -p work pushd work echo "Copying ${key} from S3 bucket ${bucket}..." aws s3 cp s3://${bucket}/${key} . --region ${region} echo "Unzipping ${key}..." unzip ${key} if [ -f ${base}.ini ]; then echo "Rendering POV-Ray scene ${base}..." if povray ${base}; then if [ -f ${base}.png ]; then echo "Copying result image ${base}.png to s3://${bucket}/${base}.png..." aws s3 cp ${base}.png s3://${bucket}/${base}.png else echo "ERROR: POV-Ray source did not generate ${base}.png image." fi else echo "ERROR: POV-Ray source did not render successfully." fi else echo "ERROR: No ${base}.ini file found in POV-Ray source archive." fi echo "Cleaning up..." popd /bin/rm -rf work echo "Deleting message..." aws sqs delete-message \ --queue-url ${queue} \ --region ${region} \ --receipt-handle "${receipt_handle}" else echo "ERROR: Could not extract S3 bucket and key from SQS message." fi fi doneRemember to run chmod +x ecs-worker.sh to make the shell script executable. This permission is copied over to the Docker image you create in the next step, which is to put together a Dockerfile that includes all you need to set up the POV-Ray software and the AWS CLI in addition to your script:
# POV-Ray Amazon ECS Worker FROM ubuntu:14.04 MAINTAINER FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM> # Libraries and dependencies RUN \ apt-get update && apt-get -y install \ autoconf \ build-essential \ git \ libboost-thread-dev \ libjpeg-dev \ libopenexr-dev \ libpng-dev \ libtiff-dev \ python \ python-dev \ python-distribute \ python-pip \ unzip \ zlib1g-dev # Compile and install POV-Ray RUN \ mkdir /src && \ cd /src && \ git clone https://github.com/POV-Ray/povray.git && \ cd povray && \ git checkout origin/3.7-stable && \ cd unix && \ sed 's/automake --w/automake --add-missing --w/g' -i prebuild.sh && \ sed 's/dist-bzip2/dist-bzip2 subdir-objects/g' -i configure.ac && \ ./prebuild.sh && \ cd .. && \ ./configure COMPILED_BY="FIRST_NAME LAST_NAME <EMAIL@DOMAIN.COM>" LIBS="-lboost_system -lboost_thread" && \ make && \ make install # Install AWS CLI RUN \ pip install awscli WORKDIR / COPY ecs-worker.sh / CMD [ "./ ecs-worker.sh" ]
Substitute your own name and email address into the COMPILED_BY parameter when using this Dockerfile. Also, this file assumes that you have the ecs-worker.sh script in your current directory when you create the Docker image.
After making sure the shell script is in the local directory and setting up the Dockerfile (and your account/credentials with Docker Hub), you can create the Docker image using the following commands:
$ docker build -t /: . $ docker login -u -e $ docker push
Step 6: Create an ECS task definition
Now that you have a Docker image ready to go, you can create an ECS task definition:{ "containerDefinitions": [ { "name": "ECSPOVRayWorker", "image": "/:", "cpu": 512, "environment": [ { "name": "AWS_REGION", "value": "<YOUR-CHOSEN-AWS-REGION>" }, { "name": "SQS_QUEUE_URL", "value": "https://<YOUR_REGION>.queue.amazonaws.com/<YOUR_AWS_ACCOUNT_ID>/ECSPOVRayWorkerQueue" } ], "memory": 512, "essential": true } ], "family": "ECSPOVRayWorker" }When using this example task definition file, remember to substitute your own values for the Dockerhub user, repository, and tag as well as your chosen AWS region and SQS queue ARN.
Now, youāre ready to register your task definition with ECS:
$ aws ecs register-task-definition āfamily ECSPOVRayWorkerTask ācli-input-json file://task-definition-file.json
Remember, the ECS task family name āECSPOVRayWorkerTaskā corresponds to the ātaskā attribute of your Lambda functionās configuration file. This is how Lambda knows which ECS task to start upon invocation; if you decide to name your ECS task definition differently, also remember to update the Lambda functionās configuration file accordingly.
Step 7: Add a policy to your ECS instance role that allows access to SQS and S3
Your SQS queue worker script running inside your Docker container on ECS needs some permissions to fetch messages from SQS, download and upload files to/from S3 and to delete messages from SQS when itās done.The following example policy shows the permissions needed for this application, in addition to the standard ECS-related permissions:
{ "Statement": [ { "Action": [ "s3:ListAllMyBuckets" ], "Effect": "Allow", "Resource": "arn:aws:s3:::*" }, { "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Effect": "Allow", "Resource": "arn:aws:s3:::" }, { "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject" ], "Effect": "Allow", "Resource": "arn:aws:s3:::/*" } ], "Version": "2012-10-17" }You can attach this policy to your ECS instance role or work the missing policy statements into your existing ECS instance role. The former option is preferred as it lets you manage different policies for different tasks in separate policy documents.
Step 8: Test your new ray-tracing service
Youāre now ready to test the new Lambda/Docker worker pattern for rendering ray-traced images!To help you test it, we have provided you with a ready to use .ZIP file, containing a sample POV-Ray scene that you can download from the awslabs/lambda-ecs-worker-pattern GitHub repository.
Upload the .ZIP file to your S3 bucket and, after a few minutes, you should see the final rendered image appear in the same bucket, looking like this:

If it doesnāt work out of the box, donāt panic! Here are some hints on how to debug this scenario:
- Use the Amazon CloudWatch Logs console and look for errors generated by Lambda. Check out the Troubleshooting section of the AWS Lambda documentation.
- Log into your ECS container node(s) and use the docker ps -a command to identify the Docker container that was running your application. Use the docker logs command to look for errors. Check out the Troubleshooting section of the ECS documentation.
- If the Docker container is still running, you can log into it using the docker exec -it /bin/bash command and see whatās going on as it happens.
All in one go
To make setup even easier, we have put together a Python Fabric script that handles all of the above tasks for you. Fabric is a Python module that makes it easy to run commands on remote nodes, transfer files over SSH, run commands locally, and structure your script in a manner similar to a makefile.
You can download the script and Python fabfile that can help set this up, along with instructions, from the awslabs/lambda-ecs-worker-pattern repository on GitHub.
Further considerations
This example is intentionally simple and generic so you can adapt it to a wide variety of situations. When implementing this pattern for your own projects, you may want to consider additional issues.
In this pattern, each Lambda function launches its own ECS container to process the event. When many events occur, multiple containers are launched and this may not be what you want; a single running ECS task can continue processing messages from SQS until the queue is empty. Consider scaling the number of running ECS tasks independently of the number of Lambda function invocations. For more information, see Scaling Amazon ECS Services Automatically Using Amazon CloudWatch and AWS Lambda.
This approach is not limited to launching ECS containers; you can use Lambda to launch any other AWS service or resource, including Amazon Elastic Transcoder jobs, Amazon Simple Workflow Service executions, or AWS Data Pipeline jobs.
Combining ECS tasks with SQS is a very simple, but powerful batch worker pattern. You can use it even without Lambda: whenever you want to get a piece of long-running batch work done, write its parameters into an SQS queue and launch an ECS task in the background, while your application continues normally.
This pattern uses SQS to buffer the full S3 bucket notification event for the ECS task to pick it up. In cases where the parameters to be forwarded to ECS are short and simple (a URL, file name, or simple data structure), you can wrap them into environment variables and specify them as overrides in the run-task operation. This means that for simple parameters, you can stop using SQS altogether.
This pattern can also save on costs. Many traditional computing tasks need a specialized software installation (like the POV-Ray rendering software in this example) but are only used intermittently (such as one time per day or per week) for less than an hour. Keeping machines or fleets for such specialized tasks can create waste because they may not reach a significant use level.
Using ECS, you can share the hardware infrastructure for your batch worker fleets among very different specific worker implementations (a ray-tracing application, ETL process, document converter, etc.). This allows you to drive higher use of your generic ECS fleet while it performs very different tasks, through the ability to run different container images on the same infrastructure. You need fewer hardware resources to accommodate a wide variety of tasks.Conclusion
This pattern is widely applicable. Many applications that can be seen as batch-driven workers can be implemented as ECS tasks that can be started from a Lambda function, using SQS to buffer parameters.
Look at your infrastructure and try to identify underused EC2 worker instances that could be re-implemented as ECS tasks; run them on a smaller, more efficient footprint and keep all of their functionality. Re-visit event-driven cases where you may have dismissed Lambda before, and try to apply the techniques outlined in this post in order to expand the usefulness of Lambda into more use cases with more complex execution requirements.
We hope you found this post useful and look forward to your comments about where you plan to implement this in your current and future projects.
-
Cost-effective Batch Processing with Amazon EC2 Spot
Tipu Qureshi, AWS Senior Cloud Support EngineerWith Spot Instances, you can save up to 90% of costs by bidding on spare Amazon Elastic Compute Cloud (Amazon EC2) instances. This reference architecture is meant to help enable you to realize cost savings for batch processing applications while maintaining high availability. We recommend tailoring and testing it for your application before implementing it in a production environment.
Below is a multi-part job processing architecture that can be used for the deployment of a heterogeneous, scalable āgridā of worker nodes that can quickly crunch through large batch processing tasks in parallel. There are numerous batch oriented applications in place today that can leverage this style of on-demand processing, including claims processing, large scale transformation, media processing and multi-part data processing work.
Raw job data is uploaded to Amazon Simple Storage Service (S3) which is a highly-available and persistent data store. An AWS Lambda function will be invoked by S3 every time a new object is uploaded to the input S3 bucket. AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information. Information about Lambda functions is available here and a walkthrough on triggering Lambda functions on S3 object uploads is available here.
AWS Lambda automatically runs your code with an IAM role that you select, making it easy to access other AWS resources, such as Amazon S3, Amazon SQS, and Amazon DynamoDB. AWS lambda can be used to place a job message into an Amazon SQS queue. Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed message queuing service, which makes it simple and cost-effective to decouple the components of a cloud application. Depending on the applicationās needs, multiple SQS queues might be required for different functions and priorities.
The AWS Lambda function will also store state information for each job task in Amazon DynamoDB. DynamoDB is a regional service, meaning that the data is automatically replicated across availability zones. AWS Lambda can be used to trigger other types of workflows as well, such as an Amazon Elastic Transcoder job. EC2 Spot can also be used with Amazon Elastic MapReduce (Amazon EMR).
Below is a sample IAM policy that can be attached to an IAM role for AWS Lambda. You will need to change the ARNs to match your resources.{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1438283855455", "Action": [ "dynamodb:PutItem" ], "Effect": "Allow", "Resource": "arn:aws:dynamodb:us-east-1::table/demojobtable" }, { "Sid": "Stmt1438283929844", "Action": [ "sqs:SendMessage" ], "Effect": "Allow", "Resource": "arn:aws:sqs:us-east-1::demojobqueue" } ] }Below is a sample Lambda function that will send an SQS message and put an item into a DynamoDB table in response to a S3 object upload. You will need to change the SQS queue URL and DynamoDB table name to match your resources.
// create an IAM Lambda role with access to Amazon SQS queue and DynamoDB table // configure S3 to publish events as shown here: http://docs.aws.amazon.com/lambda/latest/dg/walkthrough-s3-events-adminuser-configure-s3.html // dependencies var AWS = require('aws-sdk'); // get reference to clients var s3 = new AWS.S3(); var sqs = new AWS.SQS(); var dynamodb = new AWS.DynamoDB(); console.log ('Loading function'); exports.handler = function(event, context) { // Read options from the event. var srcBucket = event.Records[0].s3.bucket.name; // Object key may have spaces or unicode non-ASCII characters. var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " ")); // prepare SQS message var params = { MessageBody: 'object '+ srcKey + ' ', QueueUrl: 'https://sqs.us-east-1.amazonaws.com//demojobqueue', DelaySeconds: 0 }; //send SQS message sqs.sendMessage(params, function (err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into SQS queue due to an error: ' + err); context.fail(srcKey, 'Unable to send message to SQS'); } // an error occurred else { //define DynamoDB table variables var tableName = "demojobtable"; var datetime = new Date().getTime().toString(); //Put item into DynamoDB table where srcKey is the hash key and datetime is the range key dynamodb.putItem({ "TableName": tableName, "Item": { "srcKey": {"S": srcKey }, "datetime": {"S": datetime }, } }, function(err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into DynamoDB table due to an error: ' + err); context.fail(srcKey, 'Unable to put data to DynamoDB Table'); } else { console.log('Successfully put object' + srcKey + ' into SQS queue and DynamoDB'); context.succeed(srcKey, 'Data put into SQS and DynamoDB'); } }); } }); };Worker nodes are Amazon EC2 Spot and On-demand instances on deployed Auto Scaling groups. These groups are containers that ensure health and scalability of worker nodes. Worker nodes pick up job parts from the input queue automatically and perform single tasks based on the job task state in DynamoDB. Worker nodes will store the input objects in a file system such as Amazon Elastic File System (Amazon EFS) for processing. Amazon EFS is a file storage service for EC2 that provides elastic capacity to your applications, automatically adding storage as you add files. Depending on IO and application needs, the job data can also be stored on local instance store or Amazon Elastic Block Store (EBS). Each job can be further split into multiples sub-parts if there is a mechanism to stitch the outputs together (as in the case of some media processing where pre-segmenting the output may be possible). Once completed, the objects will be uploaded back to S3 using multi-part upload.
Similar to our blog on an EC2 Spot architecture for web applications, Auto Scaling groups running On-demand instances can be used together with groups running Spot instances. Spot Auto Scaling groups can use different Spot bid prices and even different instance types to give you more flexibility and to meet changing traffic demands.
The availability of Spot instances can vary depending on how many unused Amazon EC2 instances are available. Because real-time supply and demand dictates the available supply of Spot instances, you should architect your application to be resilient to instance termination. When the Spot price exceeds the price you named (i.e. the bid price), the instance will receive a warning that it will be terminated after two minutes. You can manage this event by creating IAM roles with the relevant SQS and DynamoDB permissions for the Spot instances to run the required shutdown scripts. Details about Creating an IAM Role Using the AWS CLI can be found here and Spot Instance Termination Notices documentation can be found here.
A script like the following can be placed in a loop and can be run on startup (e.g. via systemd or rc.local) to detect for Spot instance termination. It can then update job task state in DynamoDB and re-insert the job task into the queue if required. We recommend that applications poll on the termination notice at five-second intervals.
#!/bin/bash while true do if curl -s http://169.254.169.254/latest/meta-data/spot/termination-time | grep -q .*T.*Z; then /env/bin/runterminationscripts.sh; else # Spot instance not yet marked for termination. sleep 5 fi doneAn Auto Scaling group running on-demand instances in tandem with Auto Scaling group(s) running Spot instances will help to ensure your applicationās availability in case of changes in Spot market price and Spot instance capacity. Additionally, using multiple Spot Auto Scaling groups with different bids and capacity pools (groups of instances that share the same attributes) can help with both availability and cost savings. By having the ability to run across multiple pools, you reduce your applicationās sensitivity to price spikes that affect a pool or two (in general, there is very little correlation between prices in different capacity pools). For example, if you run in five different pools your price swings and interruptions can be cut by 80%.
For Auto Scaling to scale according to your applicationās needs, you must define how you want to scale in response to changing conditions. You can assign more aggressive scaling policies to Auto Scaling groups that run Spot instances, and more conservative ones to Auto Scaling groups that run on-demand instances. The Auto Scaling instances will scale up based on the SQS queue depth CloudWatch metric (or more relevant metric depending on your application), but will scale down based on EC2 instance CPU utilization CloudWatch metric (or more relevant metric) to ensure that a job actually gets completed before the instance is terminated. For information about using Amazon CloudWatch metrics (such as SQS queue depth) to scale automatically, see Dynamic Scaling. As a safety buffer, a grace period (e.g. 300 seconds) should also be configured for the Auto Scaling group to prevent termination before a job completes.
Reserved instances can be purchased for the On-demand EC2 instances if they are consistently being used to realize even more cost savings.
To automate further, you can optionally create a Lambda function to dynamically manage Auto Scaling groups based on the Spot market. The Lambda function could periodically invoke the EC2 Spot APIs to assess market prices and availability and respond by creating new Auto Scaling launch configurations and groups automatically. Information on the Describe Spot Price History API is available here. This function could also delete any Spot Auto Scaling groups and launch configurations that have no instances. AWS Data Pipeline can be used to invoke the Lambda function using the AWS CLI at regular intervals by scheduling pipelines. More information about scheduling pipelines is available here and information about invoking AWS Lambda functions using AWS CLI is here.
-
Building NoSQL Database Triggers with Amazon DynamoDB and AWS Lambda
Tim Wagner, AWS Lambda General Manager
SQL databases have offered triggers for years, making it easy to validate and check data, maintain integrity constraints, create compute columns, and more. Why should SQL tables have all the funā¦letās do the equivalent for NoSQL data!
Amazon DynamoDB recently launched their streams feature (table update notifications) in production. When you combine this with AWS Lambda, itās easy to create NoSQL database triggers that let you audit, aggregate, verify, and transform data. In this blog post weāll do just that: first, weāll create a data audit trigger, then weāll extend it to also transform the data by adding a computed column to our table that the trigger maintains automatically. Weāll use social security numbers and customer names as our sample data, because theyāre representative of something you might find in a production environment. Letās get startedā¦
Data Auditing
Our first goal is to identify and report invalid social security numbers. Weāll accept two formats: 9 digits (e.g., 123456789) and 11 digits (e.g., 123-45-6789). Anything else is an error and will generate an SNS message which, for the purposes of this demo, weāll use to send an email report of the problem.
Setup Part 1: Defining a Table Schema
First, start by creating a new table; Iām calling it āTriggerDemoā:

For our example weāll use just two fields: Name and SocialSecurityNumber (the primary hash key and primary range key, respectively, both represented as strings). In a more realistic setting youād typically have additional customer-specific information keyed off these fields. You can accept the default capacity settings and you donāt need any secondary indices.

You do need to turn on streams in order to be able to send updates to your AWS Lambda function (weāll get to that in a minute). You can read more about configuring and using DynamoDB streams in the DynamoDB developer guide.

Hereās the summary view of the table weāve just configured:

Setup Part 2: SNS Topic and Email Subscription
To give us a way to report errors, weāll create an SNS topic; Iām calling mine, āBadSSNNumbersā.

(The other topic here is the DynamoDB alarm.)

ā¦and then Iāll subscribe my email to it to receive error notifications:

(I havenāt shown it here, but you can also turn on SNS logging as a debugging aid.)
Ok, we have a database and a notification systemā¦now we need a compute service!
Setup Part 3: A Lambda-based Trigger
Now weāll create an AWS Lambda function that will respond to DynamoDB updates by verifying the integrity of each social security number, using the SNS topic we just created to notify us of any problematic entries.
First, create a new Lambda function by selecting the ādynamodb-process-streamā blueprint. Blueprints help you get started quickly with common tasks.

For the event source, select your TriggerDemo table:

Youāll also need to provide your function with permissions to read from the stream by choosing the recommended role (DynamoDB event stream role):

The blueprint-provided permission policy only assumes youāre going to read from the update stream and create log entries, but we need an additional permission: publishing to the SNS topic. In the later part of this demo weāll also want to write to the table, so letās take care of both pieces at once: Hop over to the IAM console and add two managed policies to your role: SNS full access and DynamoDB full access. (Note: This is an quick approach for demo purposes, but if you want to use the techniques described here for a production table, I strongly recommend your create custom āminimal trustā policies that permit only the necessarily operations and resources to be accessed from your Lambda function.)

The code for the Lambda function is straightforward: It receives batches of change notifications from DynamoDB and processes each one in turn by checking its social security number, reporting any malformed ones via the SNS topic we configured earlier. Replace the sample code provided by the blueprint with the following, being sure to replace the SNS ARN with the one from your own topic:
var AWS = require('aws-sdk'); var sns = new AWS.SNS(); exports.handler = function(event, context) {processRecord(context, 0, event.Records);} // Process each DynamoDB record function processRecord(context, index, records) { if (index == records.length) { context.succeed("Processed " + records.length + " records."); return; } record = records[index]; console.log("ID: " + record.eventID + "; Event: " + record.eventName); console.log('DynamoDB Record: %j', record.dynamodb); // Assumes SSN# is only set only on row creation if ((record.eventName != "INSERT") || valid(record)) processRecord(context, index+1, records); else { console.log('Invalid SSN # detected'); var name = record.dynamodb.Keys.Name.S; console.log('name: ' + name); var ssn = record.dynamodb.Keys.SocialSecurityNumber.S; console.log('ssn: ' + ssn); var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.'; console.log('Message to send: ' + message); var params = { Message: message, TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE' }; sns.publish(params, function(err, data) { if (err) console.log(err, err.stack); else console.log('malformed SSN message sent successfully'); processRecord(context, index+1, records); }); } } // Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn. function valid(record) { var SSN = record.dynamodb.Keys.SocialSecurityNumber.S; if (SSN.length != 9 && SSN.length != 11) return false; if (SSN.length == 9) { for (var indx in SSN) if (!isDigit(SSN[indx])) return false; return true; } else { return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) && SSN[3] == '-' && isDigit(SSN[4]) && isDigit(SSN[5]) && SSN[6] == '-' && isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]); } } function isDigit(c) {return c >= '0' && c <= '9';}Testing the Trigger
Ok, now itās time to see things in action. First, use the āTestā button on the Lambda console to validate your code and make sure the SNS notifications are sending email. Next, if you created your Lambda function event source in a disabled state, enable it now. Then go to the DynamoDB console and enter some sample data. First, letās try a valid entry:

Since this one was valid, you should get a CloudWatch Log entry but no email. Now for the fun part: Try an invalid entry, such as āBob Smithā with a social security number of āasdfā. You should receive an email notification something like this for the invalid SSN entry:

You can also check the Amazon CloudWatch Logs to see the analysis and reporting in action and debug any problems:

So in a few lines of Lambda function code we implemented a scalable, serverless NoSQL trigger capable of auditing every change to a DynamoDB table and reporting any errors it detects. You can use similar techniques to validate other data types, aggregate or mark suspected errors instead of reporting them via SNS, and so forth.
Data Transformation
In the previous section we audited the data. Now weāre going to take it a step further and have the trigger also maintain a computed column that describes the format of the social security number. The computed attribute can have one of three values: 9 (meaning, āThe social security number in this row is valid and is a 9-digit formatā), 11, or āINVALIDā.
We donāt need to alter anything about the DynamoDB table or the SNS topic, but in addition to the extra code, the IAM permissions for the Lambda function must now allow us to write to the DynamoDB table in addition to reading from its update stream. If you added the DynamoDBFullAccess managed policy earlier when you did the SNS policy, youāre already good. If not, hop over to the IAM console and add that second managed policy now. (Also see the best practice note above on policy scoping if youāre putting this into production.)
The code changes only slightly to add the new DynamoDB writes:
var AWS = require('aws-sdk'); var sns = new AWS.SNS(); var dynamodb = new AWS.DynamoDB(); exports.handler = function(event, context) {processRecord(context, 0, event.Records);} function processRecord(context, index, records) { if (index == records.length) { context.succeed("Processed " + records.length + " records."); return; } record = records[index]; console.log("ID: " + record.eventID + "; Event: " + record.eventName); console.log('DynamoDB Record: %j', record.dynamodb); if (record.eventName != "INSERT") processRecord(context, index+1, records); else if (valid(record)) { var name = record.dynamodb.Keys.Name.S; var ssn = record.dynamodb.Keys.SocialSecurityNumber.S; dynamodb.putItem({ "TableName":"TriggerDemo", "Item": { "Name": {"S": name}, "SocialSecurityNumber": {"S": ssn}, "SSN Format": {"S": ssn.length == 9 ? "9" : "11"} } }, function(err, data){ if (err) console.log(err, err.stack); processRecord(context, index+1, records); }); } else { console.log('Invalid SSN # detected'); var name = record.dynamodb.Keys.Name.S; console.log('name: ' + name); var ssn = record.dynamodb.Keys.SocialSecurityNumber.S; console.log('ssn: ' + ssn); var message = 'Invalid SSN# Detected: Customer ' + name + ' had SSN field of ' + ssn + '.'; console.log('Message to send: ' + message); var params = { Message: message, TopicArn: 'YOUR BadSSNNumbers SNS ARN GOES HERE' }; sns.publish(params, function(err, data) { if (err) console.log(err, err.stack); else console.log('malformed SSN message sent successfully'); dynamodb.putItem({ "TableName":"TriggerDemo", "Item": { "Name": {"S": name}, "SocialSecurityNumber": {"S": ssn}, "SSN Format": {"S": "INVALID"} } }, function(err, data){ if (err) console.log(err, err.stack); processRecord(context, index+1, records); }); }); } } // Social security numbers must be in one of two forms: nnn-nn-nnnn or nnnnnnnnn. function valid(record) { var SSN = record.dynamodb.Keys.SocialSecurityNumber.S; if (SSN.length != 9 && SSN.length != 11) return false; if (SSN.length == 9) { for (var indx in SSN) if (!isDigit(SSN[indx])) return false; return true; } else { return isDigit(SSN[0]) && isDigit(SSN[1]) && isDigit(SSN[2]) && SSN[3] == '-' && isDigit(SSN[4]) && isDigit(SSN[5]) && SSN[6] == '-' && isDigit(SSN[7]) && isDigit(SSN[8]) && isDigit(SSN[9]) && isDigit(SSN[10]); } } function isDigit(c) {return c >= '0' && c <= '9';}Now you can go to the DynamoDB console to add more rows to your table to watch your trigger both check and your entries and maintain a computed format column for them. Donāt forget to refresh the DynamoDB table browse view to see the updates!
(Now that the code updates rows in the original table, testing from the Lambda console will generate double notifications ā the first one from the original test, and the second when the item is created for real. You could add an āistestā field to the sample event in the console test experience and a condition in the code to prevent this if you want to keep testing āofflineā from the actual table.)
I chose to leave the original data unchanged in this example, but you could also use the trigger to transform the original values instead ā for example, choosing the 11-digit format as the canonical one and then converting any 9-digit values into their 11-digit equivalents.
Summary
In this post we explored combining DynamoDB stream notifications with AWS Lambda functions to recreate conventional database triggers in a serverless, NoSQL architecture. We used a simple nodejs function to first audit and later transform rows in the table in order to find invalid social security numbers and to compute the format of the number in each entry. While we worked in JavaScript for this example, you could also use Java, Clojure, Scala, or other jvm-based languages to write your trigger. Our notification method of choice for the demo was an SNS-provided email, but text messages, web hooks, and SQS entries just require a different subscription.
Until next time, happy Lambda (and database trigger) coding!
-Tim

