🔍 Deep Dive: AWS Official Blog

S3 Lifecycle Management Update – Support for Multipart Uploads and Delete Markers

by Jeff Barr | on 16 MAR 2016 | in Amazon S3 | Permalink | Comments

It is still a bit of a shock to me to realize that Amazon S3 is now ten years old! The intervening decade has simply flown by.

For several years, you have been able to use S3’s Lifecycle Management feature to control the storage class and the lifetime of your objects. As you may know, you can set up rules on a per-bucket or per-prefix basis. Each rule specifies an action to be taken when objects reach a certain age.

Today we are adding two rules that will give you additional control over two special types of objects: incomplete multipart uploads and expired object delete markers. Before we go any further, I should define these objects!

Incomplete Multipart Uploads – S3’s multipart upload feature accelerates the uploading of large objects by allowing you to split them up into logical parts that can be uploaded in parallel. If you initiate a multipart upload but never finish it, the in-progress upload occupies some storage space and will incur storage charges. However, these uploads are not visible when you list the contents of a bucket and (until today’s release) had to be explicitly removed.

Expired Object Delete Markers – S3’s versioning feature allows you to preserve, retrieve, and restore every version of every object stored in a versioned bucket. When you delete a versioned object, a delete marker is created. If all previous versions of the object subsequently expire, an expired object delete marker is left. These markers do not incur storage charges. However, removing unneeded delete markers can improve the performance of S3’s LIST operation.

New Rules
You can now exercise additional control over these objects using some new lifecycle rules, lowering your costs and improving performance in the process. As usual, you can set these up using the AWS Management Console, the S3 APIs, the AWS Command Line Interface (CLI), or the AWS Tools for Windows PowerShell.

Here’s how you set up a rule for incomplete multipart uploads using the Console. Start by opening the console and navigating to the desired bucket (mine is called jbarr):

Then click on Properties, open up the Lifecycle section, and click on Add rule:

Decide on the target (the whole bucket or the prefixed subset of your choice) and then click on Configure Rule:

Then enable the new rule and select the desired expiration period:

As a best practice, we recommend that you enable this setting even if you are not sure that you are actually making use of multipart uploads. Some applications will default to the use of multipart uploads when uploading files above a particular, application-dependent, size.

Here’s how you set up a rule to remove delete markers for expired objects that have no previous versions:

S3 Best Practices
While you are here, here are some best practices that you should consider using for your own S3-based applications:

Versioning – You can enable Versioning for your S3 buckets in order to be able to recover from accidental overwrites and deletes. With versioning turned on, you can preserve, retrieve, and restore earlier versions of your data.

Replication – Take advantage of S3’s Cross-Region Replication in order to meet your organization’s compliance policies by creating a replica of your data in a second AWS Region.

Performance -If you anticipate a consistently high number of PUT, LIST, DELETE, or GET requests against your buckets, you can optimize your application’s performance by implementing the tips outlined in the performance section of the Amazon S3 documentation.

Cost Management – You can reduce your costs by setting up S3 lifecycle policies that will transition your data to other S3 storage tiers or expire data that is no longer needed.

— Jeff;

Additional Failover Control for Amazon Aurora

by Jeff Barr | on 16 MAR 2016 | in Amazon Aurora | Permalink | Comments

Amazon Aurora is a fully-managed, MySQL-compatible, relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source database (read my post, Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon RDS, to learn more).

Aurora allows you create up to 15 read replicas to increase read throughput and for use as failover targets. The replicas share storage with the primary instance and provide lightweight, fine-grained replication that is almost synchronous, with a replication delay on the order of 10 to 20 milliseconds.

Additional Failover Control
Today we are making Aurora even more flexible by giving you control over the failover priority of each read replica. Each read replica is now associated with a priority tier (0-15). In the event of a failover, Amazon RDS will promote the read replica that has the highest priority (the lowest numbered tier). If two or more replicas have the same priority, RDS will promote the one that is the same size as the previous primary instance.

You can set the priority when you create the Aurora DB instance:

This feature is available now and you can start using it today. To learn more, read about Fault Tolerance for an Aurora DB Cluster.

— Jeff;

AWS Database Migration Service

by Jeff Barr | on 15 MAR 2016 | in Amazon Aurora, Amazon RDS, AWS Database Migration Service | Permalink | Comments

Do you currently store relational data in an on-premises Oracle, SQL Server, MySQL, MariaDB, or PostgreSQL database? Would you like to move it to the AWS cloud with virtually no downtime so that you can take advantage of the scale, operational efficiency, and the multitude of data storage options that are available to you?

If so, the new AWS Database Migration Service (DMS) is for you! First announced last fall at AWS re:Invent, our customers have already used it to migrate over 1,000 on-premises databases to AWS. You can move live, terabyte-scale databases to the cloud, with options to stick with your existing database platform or to upgrade to a new one that better matches your requirements. If you are migrating to a new database platform as part of your move to the cloud, the AWS Schema Conversion Tool will convert your schemas and stored procedures for use on the new platform.

The AWS Database Migration Service works by setting up and then managing a replication instance on AWS. This instance unloads data from the source database and loads it into the destination database, and can be used for a one-time migration followed by on-going replication to support a migration that entails minimal downtime. Along the way DMS handles many of the complex details associated with migration, including data type transformation and conversion from one database platform to another (Oracle to Aurora, for example). The service also monitors the replication and the health of the instance, notifies you if something goes wrong, and automatically provisions a replacement instance if necessary.

The service supports many different migration scenarios and networking options One of the endpoints must always be in AWS; the other can be on-premises, running on an EC2 instance, or running on an RDS database instance. The source and destination can reside within the same Virtual Private Cloud (VPC) or in two separate VPCs (if you are migrating from one cloud database to another). You can connect to an on-premises database via the public Internet or via AWS Direct Connect.

Migrating a Database
You can set up your first migration with a couple of clicks! You simply create the target database, migrate the database schema, set up the data replication process, and initiate the migration. After the target database has caught up with the source, you simply switch to using it in your production environment.

I start by opening up the AWS Database Migration Service Console (in the Database section of the AWS Management Console as DMS) and clicking on Create migration.

The Console provides me with an overview of the migration process:

I click on Next and provide the parameters that are needed to create my replication instance:

For this blog post, I selected one of my existing VPCs and unchecked Publicly accessible. My colleagues had already set me up with an EC2 instance to represent my “on-premises” database.

After the replication instance has been created, I specify my source and target database endpoints and then click on Run test to make sure that the endpoints are accessible (truth be told, I spent some time adjusting my security groups in order to make the tests pass):

Now I create the actual migration task. I can (per the Migration type) migrate existing data, migrate and then replicate, or replicate going forward:

I could have clicked on Task Settings to set some other options (LOBs are Large Objects):

The migration task is ready, and will begin as soon as I select it and click on Start/Resume:

I can watch for progress, and then inspect the Table statistics to see what happened (these were test tables and the results are not very exciting):

At this point I would do some sanity checks and then point my application to the new endpoint. I could also have chosen to perform an ongoing replication.

The AWS Database Migration Service offers many options and I have barely scratched the surface. You can, for example, choose to migrate only certain tables. You can also create several different types of replication tasks and activate them at different times. I highly recommend you read the DMS documentation as it does a great job of guiding you through your first migration.

If you need to migrate a collection of databases, you can automate your work using the AWS Command Line Interface (CLI) or the Database Migration Service API.

Price and Availability
The AWS Database Migration Service is available in the US East (Northern Virginia), US West (Oregon), US West (Northern California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney) Regions and you can start using it today (we plan to add support for other Regions in the coming months).

Pricing is based on the compute resources used during the migration process, with a charge for longer-term storage of logs. See the Database Migration Service Pricing page for more information.

— Jeff;

Thank You Splunk – We’re Happy to be Your Alliance Partner

by Jeff Barr | on 15 MAR 2016 | in AWS Partner Network | Permalink | Comments

The AWS Partner Network (APN) helps our partners to build successful businesses around AWS. Members of APN provide consulting services (APN Consulting Partners) or software solutions (APN Technology Partners) that are integrated with the AWS platform.

I am happy to be able announce that AWS Advanced Technology Partner Splunk (read their APN entry) has named Amazon Web Services to be their Worldwide Alliance Partner of the Year (read the press release to learn more). We are thrilled to be able to work with them to make their solution available to AWS customers worldwide.

The Splunk App for AWS is one of the most popular apps on Splunkbase. The app provides you with insight into the operational and security issues associated with your AWS account. It works in conjunction with AWS Config, AWS CloudTrail, VPC Flow Logs, AWS Billing, and S3 to provide you a a logical, toplogically-oriented dashboard designed to help you to optimize resources and detect problems.

— Jeff;

Amazon EMR 4.4.0 – Sqoop, HCatalog, Java 8, and More

by Jeff Barr | on 14 MAR 2016 | in Amazon EMR | Permalink | Comments

Rob Leidle, Development Manager for Amazon EMR, wrote the guest post below to introduce you to the latest and greatest version!

— Jeff;

Today we are announcing Amazon EMR release 4.4.0, which adds support for Apache Sqoop (1.4.6) and Apache HCatalog 1.0.0, an upgraded release of Apache Mahout (0.11.1), and upgraded sandbox releases for Presto (0.136) and Apache Zeppelin (0.5.6). We have also enhanced our default Apache Spark settings and added support for Java 8.

New Applications in Release 4.4.0
Amazon EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on managed clusters of Amazon EC2 instances. You can create Amazon EMR clusters from the Amazon EMR Create Cluster Page in the AWS Management Console, AWS Command Line Interface (CLI), or using a SDK with an EMR API. In the latest release, we added support for several new versions of the following applications:

Zeppelin 0.5.6 – Zeppelin is an open-source interactive and collaborative notebook for data exploration using Spark. Zeppelin 0.5.6 adds the ability to import or export a notebook, notebook storage in GitHub, auto-save on navigation, and better Pyspark support. View the Zeppelin release notes or learn more about Zeppelin on Amazon EMR.
Presto 0.136 – Presto is an open-source, distributed SQL query engine designed for low-latency queries on large datasets in Amazon S3 and HDFS. This is a minor version release, with support for larger arrays, SQL binary literals, the ability to call connector-defined procedures, and improvements to the web interface. View the Presto release notes or learn more about Presto on Amazon EMR.
Sqoop 1.4.6 – Sqoop is a tool for transferring bulk data between HDFS, S3 (using EMRFS), and structured datastores such as relational databases. You can use Sqoop to transfer structured data from RDS and Aurora to EMR for processing, and write out results back to S3, HDFS, or another database. Learn more about Sqoop on Amazon EMR.
Mahout 0.11.1 – Mahout is a collection of tools and libraries for building distributed machine learning applications. This release includes support for Spark as well as a new math environment based on Spark named Samsara. Learn more about Mahout on Amazon EMR.
HCatalog 1.0.0 – HCatalog is a sub-project within the Apache Hive project. It is a table and storage management layer for Hadoop which utilizes the Hive Metastore. It enables tools to execute SQL on Hadoop through an easy to use REST interface.

Enhancements to the default settings for Spark
We have improved our default configuration for Spark executors from the Apache defaults to better utilize resources on your cluster. Starting with release 4.4.0, EMR has enabled dynamic allocation of executors by default, which lets YARN determine how many executors to utilize when running a Spark application. Additionally, the amount of memory used for each executor is now automatically determined by the instance family used for your cluster’s core instance group.

Enabling dynamic allocation and customizing the executor memory allows Spark to utilize all resources on your cluster, place additional executors on nodes added to your cluster, and better allow for multitenancy for Spark applications. The previous maximizeResourceAllocation parameter is still available. However, this doesn’t use dynamic allocation, and specifies a static number of executors for your Spark application. You can also still override the new defaults by using the configuration API or passing additional parameters when submitting your Spark application using spark-submit. Learn more about Spark configuration on Amazon EMR.

Using Java 8 with your applications on Amazon EMR
By default, applications on your Amazon EMR cluster use the Java Development Kit 7 (JDK 7) for their runtime environment. However, on release 4.4.0, you can use JDK 8 by setting JAVA_HOME to point to JDK 8 for the relevant environment variables using a configuration object (though please note that JDK 8 is not compatible with Apache Hive). Learn more about using Java 8 on Amazon EMR.

Launch an Amazon EMR Cluster with Release 4.4.0 Today
To create an Amazon EMR cluster with 4.4.0, select release 4.4.0 on the Create Cluster page in the AWS Management Console, or use the release label emr-4.4.0 when creating your cluster from the AWS CLI or using a SDK with the EMR API.

— Rob Leidle – Development Manager, Amazon EMR

AWS Week in Review – March 7, 2016

by Jeff Barr | on 14 MAR 2016 | in Week in Review | Permalink | Comments

Let’s take a quick look at what happened in AWS-land last week:

Monday March 7	We launched Notifications for AWS CodeCommit. We announced that New AWS Accounts Now Default to Long EC2 Resource IDs. The AWS Security Blog showed you How to Automate Restricting Access to a VPC by Using AWS IAM and AWS CloudFormation. Botmetric talked about Tackling AWS Security Threat Landscapes: Access Controls. CloudCheckr shared 5 Tips to Best Leverage Diverse AWS Services. CloudEndure listed the Top 5 Cloud Computing Books to Read in 2016. Cloud Academy published Part 3 of a series on Centralized Log Management with AWS CloudWatch. Trek10 talked about Lambda Fanout, What is is Good For?
Tuesday March 8	We announced Availability of t2.nano Instances in the EU (Frankfurt) and Asia Pacific (Sydney) Regions. We announced you can now Run XCTest UI Tests with AWS Device Farm. The AWS Partner Network Blog talked about Modeling SaaS Tenant Profiles on AWS. The AWS Security Blog showed you How to Reduce Security Threats and Operating Costs Using AWS WAF and Amazon CloudFront. N2W Software explained How to Automated Your Backup Operations in AWS. Sungard showed you How to Implement Microservices using AWS Lambda and Deploy with CloudFormation. Cloudyn explained How to Measure Your Core-Hours Costs to Gain Another Level of Cloud Cost Optimization. Cloud Academy published an Introduction and Walkthrough of AWS Config. ParkMyCloud showed you How to Manage Parking Recommendations in ParkMyCloud. Localytics wrote about Serverless Slackbots Powered by AWS.
Wednesday March 9	We announced that Amazon ElastiCache now supports Memcached Auto-Discovery for PHP 7. Guest posts showed you how to Use Enhanced RDS Monitoring with Datadog and told the story of Flatiron Health – Using AWS to Help Improve Cancer Treatment. We updated the AWS CLI, AWS SDK for Java, AWS SDK for Go, AWS SDK for JavaScript, and the AWS SDK for Ruby. 8KMiles listed 5 Reasons Why Pharmaceutical Companies Need to Migrate to the Cloud. Stelligent showed you how to Create a Pipeline Using the AWS CodePipeline Console. Spotinst talked about Implementing Blue/Green Deployments with Elastigroup on AWS. Netflix described How We Build Code at Netflix. Cloud Technology Partners shared A Bulletproof DevOps Strategy to Ensure Success in the Cloud. DZone Cloud Zone talked about Automatic Deployment Through a Bastion (Gateway) Server. Gathering Clouds talked about The #1 AWS Cloud Security Tool for Retailers and eCommerce. Gorillastack asked Is Virtual Reality The Next Frontier For Amazon Web Services To Conquer? Trek10 introduced LambdaClock. Serverworks wrote about Parallel Image Processing for Fluid Mechanics with AWS Lambda.
Thursday March 10	We announced that Amazon CloudWatch Logs now has AWS CloudTrail support and new Amazon CloudWatch Metrics. We announced that AWS CodeDeploy is Now Available in the South America (Sao Paulo) Region. We announced that Amazon CloudWatch Logs Available in the South America (Sao Paulo) Region. We announced that Amazon Redshift Now Supports Table Level Restore. We updated the AWS SDK for Ruby and the AWS SDK for Go. We published the Second Amazon Linux AMI 2016.03 Release Candidate. The Amazon GameDev Blog announced New Regions and Autoscaling Features for Amazon GameLift. The AWS Big Data Blog shared a partner post from Attunity. The AWS Government, Education, & Nonprofits Blog explained How Cities Can Stop Wasting Money, Move Faster, and Innovate. The AWS Partner Network Blog talked about Architecting Microservices Using Weave Net and Amazon EC2 Container Service. James Hamilton wrote about A Decade of Innovation. ParkMyCloud showed you How to Save Money with AWS Scripting. Skeddly showed you how to Change EBS Volume Action.
Friday March 11	I reviewed some Hot Startups on AWS. Werner Vogels shared 10 Lessons from 10 Years of Amazon Web Services. The AWS Government, Education, & Nonprofits Blog talked about the Cities of the Future, Today. 8KMiles hosted a Tweet Chat on Amazon KMS. Mark Litwintschik examined A Billion Taxi Rides on Amazon EMR Running Presto.
Saturday March 12	The AWS Government, Education, & Nonprofits Blog announced that the AWS 2016 City on a Cloud Innovation Challenge is Live. Toby Hede is writing The Complete and Most Excellent Micro Manual for Hosting a Static Website on AWS.
Sunday March 13	Serverless Code announced Zappa, Django, and Lambda VPC Support, discussed Using Python in the Serverless Framework, and talked about Using Scikit-Learn in AWS Lambda. Ted Timmons wrote about AWS CloudFormation, VPC NAT, and Donuts.

New & Notable Open Source

sqs-to-lambda-via-lambda implements Amazon SQS to Lambda using Lambda.
akiro magically compiles NPM packages with native extensions for Lambda.
cloudwatch-to-sumo sends metrics from CloudWatch to Sumo Logic.
awsam is an AWS Account Manager modeled after rvm.
aws-jwt-auth is an API Gateway custom authorizer to validate JWTs created by WSO2.
aws_mbedtls_mqtt is the source code to use the mbedTLS library to connect to AWS IoT.
jaxrs-lib contains Jersey and Hibernate Components for building REST APIs hosted on Elastic Beanstalk.
autosignr is a Puppet Certificate Auto-signer for AWS.
llama-cli is Chaos Llama, a tool for testing resiliency and recoverability of AWS-based architectures.
cfn-amibaker bakes EC2 AMIs using CloudFormation and Lambda.

New SlideShare Presentations

Intro to AWS IoT.

Upcoming Events

March 14th – Live Event (Seattle, Washington) – Seattle AWS Architects & Engineers – Lambda + Alexa AWS Teams.
March 15th – Live Event (San Francisco, California) – Amazon Lumberyard team at GDC 2016.
March 17th – Webinar – Security Best Practices for Retailers on AWS.
March 17th – Live Event (Netherlands) – Security in the Cloud.
March 22nd – Live Broadcast – VoiceOps: Commanding and Controlling Your AWS environments using Amazon Echo and Lambda.
March 23rd – Live Event (Atlanta, Georgia) – AWS Key Management Service & AWS Storage Services for a Hybrid Cloud (Atlanta AWS Community).
April 6th – Live Event (Boston, Massachusetts) AWS at Bio-IT World.
April 18th & 19th – Live Event (Chicago, Illinois) – AWS Summit – Chicago.
April 20th – Live Event (Melbourne, Australia) – Inaugural Melbourne Serverless Meetup.
April 26th – Live Event (Sydney, Australia) – Inaugural Sydney Serverless Meetup.
AWS Loft – San Francisco.
AWS Loft – New York.
AWS Loft – Tel Aviv.
AWS Public Sector Events.
AWS Global Summit Series.

Help Wanted

AWS Careers.

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

— Jeff;

Developer Preview of AWS SDK for C++ is Now Available

by Jeff Barr | on 14 MAR 2016 | in AWS SDK for C++, Developer Tools | Permalink | Comments

My colleague Jonathan Henson has great news for C++ developers who would like to use AWS.

— Jeff;

I am happy to announce that the AWS SDK for C++ is now available as a developer preview. Last fall, we released the SDK in an experimental state to gather feedback and improve the APIs. Since then, we have received more than 100 issues and pull requests on GitHub. Many excited developers in the open source community gave valuable feedback that helped to improve the stability and expand the features of this SDK.

Changes and Additions
Here are some additions we’ve made since our experimental release:

Full service coverage parity with the rest of the SDKs.
Visual Studio 2015 support.
OS X El Capitan support.
Presigned URL support.
Expansion of and improvements to the Amazon S3 TransferClient.
Inline documentation improvements.
More integration for custom memory management.
Forward-compatible enumeration support.
Improvements to our CMake exports to simplify consumer builds.
Unicode support.
Several service client fixes and improvements.
Ability to build only the clients you need.
Custom signed regions and endpoints.
Common Crypto support for Apple platforms (OpenSSL is no longer required on iOS and OS X).
Several stability updates related to multi-threading in our Curl interface on Unix and Linux.
The Service Client Generator is now open sourced and integrated into the build process.

Also, NSURL support for Apple platforms will be committed within a week or so. After that, Curl will no longer be required on iOS or OS X.

The team would like to to thank those who have been involved in improving this SDK over the past six months. Please continue contributing and leaving feedback on our GitHub Issues page.

Before we move to General Availability, we would like to receive another round of feedback to help us pin down the API with a stable 1.0 release. If you are a C++ developer, please feel free to give this new SDK a try and let us know what you think.

In Other News
Here are a few other things that you may find interesting:

We have moved our GitHub repository from the awslabs organization to aws/aws-sdk-cpp.
We are now providing new releases for new services and features with the rest of the AWS SDKs.
We now have a C++ developer blog. We’ll post tutorials and samples there throughout the year. We’ll also announce improvements and features there, so stay tuned!
We will distribute pre-built binaries for our most popular platforms in the near future. We’ll let you know when they go live.

Sample Code
Here is some sample code that writes some data to a Kinesis stream and then consumes the data:

#include <aws/kinesis/model/PutRecordsRequest.h>
#include <aws/kinesis/KinesisClient.h>
#include <aws/core/utils/Outcome.h>

using namespace Aws::Utils;
using namespace Aws::Kinesis;
using namespace Aws::Kinesis::Model;

class KinesisProducer
{
public:
    KinesisProducer(const Aws::String& streamName, const Aws::String& partition) : m_partition(partition), m_streamName(streamName)
    {}

    void StreamData(const Aws::Vector& data)
    {
        PutRecordsRequest putRecordsRequest;
        putRecordsRequest.SetStreamName(m_streamName);

        for(auto& datum : data)
        {
            PutRecordsRequestEntry putRecordsRequestEntry;
            putRecordsRequestEntry.WithData(datum)
                    .WithPartitionKey(m_partition);

            putRecordsRequest.AddRecords(putRecordsRequestEntry);
        }

        m_client.PutRecordsAsync(putRecordsRequest,
               std::bind(&KinesisProducer::OnPutRecordsAsyncOutcomeReceived, this, std::placeholders::_1, std::placeholders::_2, std::placeholders::_3, std::placeholders::_4));
    }

private:
    void OnPutRecordsAsyncOutcomeReceived(const KinesisClient*, const Model::PutRecordsRequest&,
                                          const Model::PutRecordsOutcome& outcome, const std::shared_ptr&)
    {
        if(outcome.IsSuccess())
        {
            std::cout << "Records Put Successfully " << std::endl;
        }
        else
        {
            std::cout << "Put Records Failed with error " << outcome.GetError().GetMessage() << std::endl;
        }
    }

    KinesisClient m_client;
    Aws::String m_partition;
    Aws::String m_streamName;
};

int main()
{
    KinesisProducer producer("kinesis-sample", "announcements");

    while(true)
    {
        Aws::String annoucement1("AWS SDK for C++");
        Aws::String annoucement2("Is Now in Developer Preview");

        producer.StreamData( {
                                     ByteBuffer((unsigned char*)annoucement1.c_str(), annoucement1.length()),
                                     ByteBuffer((unsigned char*)annoucement2.c_str(), annoucement2.length())
                             });

        std::this_thread::sleep_for(std::chrono::milliseconds(5));
    }

    return 0;
}

— Jonathan Henson, Software Development Engineer (SDE)

Ten Years in the AWS Cloud – How Time Flies!

by Jeff Barr | on 14 MAR 2016 | in Announcements | Permalink | Comments

Ten years ago today I announced the launch of Amazon S3 with a simple blog post! It is hard to believe that a decade has passed since then, or that I have written well over 2000 posts during that time.

Future Shock
When I was in high school, I read and reported on a relatively new (for 1977) book titled Future Shock. In the book, futurist Alvin Toffler argued that the rapid pace of change had the potential to overwhelm, stress, and disorient people. While the paper I wrote has long since turned to dust, I do remember arguing that change was good, and that people and organizations would be better served by preparing to accept and to deal with it.

Early in my career I saw that many supposed technologists were far better at clinging to the past than they were at moving into the future. By the time I was 21 I had decided that it would be better for me to live in the future than in the past, and to not just accept change and progress, but to actively seek it out. Now, 35 years after that decision, I can see that I chose the most interesting fork in the road. It has been a privilege to be able to bring you AWS news for well over a decade (I wrote my first post in 2004).

A Decade of IT Change
Looking back at the past decade, it is pretty impressive to see just how much the IT world has changed. Even more impressive, the change is not limited to technology. Business models have changed, as has the language around it. At the same time that changes on the business side have brought about new ways to acquire, consume, and pay for resources (empowering both enterprises and startups in the process), the words that we use to describe what we do have also changed! A decade ago we would not have spoken of the cloud, microservices, serverless applications, the Internet of Things, containers, or lean startups. We would not have practiced continuous integration, continuous delivery, DevOps, or ChatOps. While you are still trying to understand and implement ChatOps, don’t forget that something even newer called VoiceOps (powered by Alexa) is already on the horizon.

Of course, dealing with change is not easy. When looking in to the future, you need to be able to distinguish between flashy distractions and genuine trends, while remaining flexible enough to pivot if yesterday’s niche becomes today’s mainstream technology. I often use JavaScript to illustrate this phenomenon. If you (like me), as a server-side developer initially brushed off JavaScript as a simple, browser-only language and chose to ignore it, you were undoubtedly taken by surprise when it was first used to build rich, dynamic Ajax applications and then run on the server in the form of Node.js.

Today, keeping current means staying abreast of developments in programming languages, system architectures, and industry best practices. It means that you spend time every day improving your current skills and looking for new ones. It means becoming comfortable in a new world where multiple deployments per day are commonplace, powered by global teams, and managed by consensus, all while remaining focused on delivering value to the business!

A Decade of AWS
While I hate to play favorites, I would like to quickly review some of my favorite AWS launches and blog posts of the past decade.

First and Still Relevant (2006) – Amazon S3. Incredibly simple in concept yet surprisingly complex behind the scenes, S3 was, as TechCrunch said at the time, game changing!

Servers by the Hour (2006) – Amazon EC2. I wrote the blog post while sitting poolside in Cabo San Lucas. The launch had been imminent for several months, and then became a fact just as I was about to hop on the plane. From that simple start (one instance type, one region, and CLI-only access), EC2 has added feature after feature (most of them driven by customer requests) and is just as relevant today as it was in 2006.

Making Databases Easy (2009) – Amazon Relational Database Service – Having spent a lot of time installing, tuning, and managing MySQL as part of a long-term personal project, I was in a perfect position to appreciate how RDS simplified every aspect of my work.

Advanced Networking (2009) – Amazon Virtual Private Cloud – With the debut of VPC, even conservative enterprises began to take a closer look at AWS. They saw that we understood the networking and isolation challenges that they faced, and were pleased that we were able to address them.

Internet-Scale Data Storage (2012) – Amazon DynamoDB – The NoSQL market was in a state of flux when we launched DynamoDB. Now that the smoke has cleared, I routinely hear about customers that use DynamoDB to store huge amounts of data and to support some pretty incredible request rates.

Data Warehouses in Minutes not Quarters (2012) – Amazon Redshift – Many companies measure implementation time for a data warehouse in terms of quarters or even years. Amazon Redshift showed them that there was a better way to get started.

Desktop Computing in the Cloud (2013) – Amazon WorkSpaces – All too often dismissed as either pedestrian or “great for someone else,” virtual desktops have become an important productivity tool for me and for our customers.

Real Time? How Much Data? (2013) – Amazon Kinesis – Capturing, processing, and deriving value from voluminous streams of data became easier and simpler when we launched Kinesis.

A New Programming Model (2014) – AWS Lambda – This is one of those disruptive, game-changers that you need to be ready for! I have been impressed by the number of traditional organizations that have already built and deployed sophisticated Lambda-powered applications. My expectation that Lambda would be most at home in startups building applications from scratch turned out to be wrong.

Devices are the Future (2015) – AWS IoT – Mass-produced compute power and widespread IP connectivity combine to allow all sorts of interesting devices to be connected to the Internet.

Moving Forward
A decade ago, discussion about the risks of cloud computing centered around adoption. It was new and unproven, and raised more questions than it answered. That era passed some time ago. These days, I hear more talk about the risk of not going to the cloud. Organizations of all shapes and sizes want to be nimble, to use modern infrastructure, and to be able to attract professionals with a strong desire to do the same. Today’s employees want to use the latest and most relevant technology in order to be as productive as possible.

I can promise you that the next decade of the cloud will be just as exciting as the one that just concluded. Keep on learning, keep on building, and share your successes with us!

— Jeff;

PS – As you can tell from this post, I strongly believe in the value of continuing education. I discussed this with my colleagues and they have agreed to make the entire set of qwikLABS online labs and learning quests available to all current and potential AWS customers at no charge through the end of March. To learn more, visit qwikLABS.com.

Hot Startups on AWS – March 2016

by Jeff Barr | on 11 MAR 2016 | in Startups | Permalink | Comments

We love startups!

When energy, enthusiasm, creativity, and passion for changing the world come together to build new and exciting businesses and applications, everyone benefits. Today I am kicking off a new series of posts. Every month I am going to feature a handful of hot, AWS-powered startups and tell you a little bit about what they built. I hope to explore a bit of the motivation behind the products and the startups and to show you how AWS has empowered them to put that energy, enthusiasm, creativity, and passion to use.

Today’s post features the following startups:

Intercom – One place for every team in an Internet business to see and talk
to customers, personally, at scale.
Tile – A popular key locator product that works with an app to help people find their stuff.
Bugsnag – A tool to capture and analyze runtime errors in production web & mobile applications.
DroneDeploy – Making the sky productive and accessible for everyone.

Intercom
The founders of Intercom previously ran a SaaS business in Dublin, Ireland. They had a problem- they didn’t know who their customers were, and couldn’t easily communicate with them. They were working on a solution when they observed a coffee shop owner casually interacting with his customers, greeting them by name, making offers tailored to their interests, addressing questions, and heading off potential problems. The founders decided to build a tool that would allow others building online businesses to have a personal touch with their customers, as opposed to simply treating them like rows in a database.

The resulting platform, Intercom, is a fundamentally new way to communicate with customers. It allows web and mobile businesses to track live customer data, and use that data to communicate with customers in a personal way on their website, inside web and mobile apps, and by email. A little bit of JavaScript (for web apps) or simple SDKs for (iOS and Android) powers live chat, marketing automation, customer feedback, and customer support.

Intercom chose AWS to allow them to move fast without having to have a large operations team. With thousands of businesses already using the product, they needed to keep the real-time conversations running at a consistent speed and with low latency. When they anticipated running up against the limits of their existing relational database and began to consider a sharded solution, they put Amazon Aurora to the test and found that it was able to handle their current load, with plenty of room to grow. They avoided the complexity of sharding, lowered their costs, and reduced the latency of their queries.

Tile
One of the founders of Tile was frustrated because his spouse had a habit of losing things. After looking in to some ways to help her, he realized two things. First, this was a very common problem (and, to be fair, one that is not gender-specific). Second, no one was addressing it. Seeing an opportunity, he co-founded Tile in 2013 and created a crowdfunding campaign to secure capital. This campaign surpassed the initial goal of 10,000 units by 20x, which delivered a key indicator that the team had found a good solution to an unmet need. Currently, the company has sold over 4.5 million Tiles, making this one of the most successful crowdfunded companies to date.

The Tiles themselves are small and simple. They can be attached to all different sorts of objects, and use Bluetooth Low Energy to communicate. When the mobile app is activated, it displays a proximity radar with range of about 100 feet, and the app can also be used to trigger a loud (90 decibel) chime on the Tile. Conversely, the Tile itself can be used to find a missing smartphone. The app can even display the last known location of each Tile on a built-in map; this is useful if the Tile is out of Bluetooth range. Finally, if the misplaced item is well and truly lost, a community-based feature can be used to provide an anonymous ping if another user’s running app comes within Bluetooth range of the missing item. Based on these functionality options, Tile is ideal for finding anything that can be lost or misplaced, from lost keys, remote controls, cell phones, and other high-value objects, large or small.

Tile chose AWS to allow them to scale rapidly and to have a global presence (they have devices in 214 countries & territories). They run multiple applications (the Tile Web App, Customer Service, and the Tile Network) on AWS using EC2, Route 53, RDS, CloudWatch, SNS, Kinesis, and Redshift. They currently process over 100 million location updates every day and regularly add new servers, modify load balancers, and update DNS entries.

Bugsnag
This hot startup was founded in a tiny San Francisco apartment that was home to Simon and James (the two founders), their respective partners, and a four-pack of cats. They wanted to provide developers of web and mobile applications with a tool that would intercept, track, and report on application crashes with an eye toward aggregated, prioritized reporting and analysis. Given the fragmented state of the mobile device world, being able to use Bugsnag to identify issues that are peculiar to one platform, device, or version ensures that developers are focused on fixing bugs that affect the most users.

Bugsnag helps thousands of companies to improve the quality of their web and mobile applications. It integrates with many languages and environments including Rails, JavaScript, Python, Go, PHP, iOS, and Android. The product captures detailed crash data, packages it up for analysis (including an encryption step), and then uploads the information to AWS where it can be used to create tickets, issue notification to tools like HipChat and Slack, and so forth. Bugsnag also includes a dashboard that supports analysis of trends over time, data-driven root cause analysis, and multiple key/value filters.

The load on Bugsnag depends on the applications shipped by their customers and can vary greatly from day to day. They currently process up to a billion crashes per day. In order to handle this large, unpredictable load as economically as possible they make use of a multitude of AWS services including a mix of On-Demand and Spot instances. Their worker fleet is comprised of a mix of both kinds of instances, managed by a pair of Auto Scaling groups. The first group contains the Spot instances. It scales up aggressively and scales down slowly. The second group contains the On-Demand instances. It scales up conservatively and scales down aggressively. To learn more about how they did this, read their blog post, Responsive infrastructure with Auto Scaling.

DroneDeploy
In 2013, three entrepreneurs in South Africa got together to plan a new venture. After observing that off-the-shelf drone hardware was maturing far more rapidly than the software needed to get the most value out of that hardware, they started DroneDeploy. Their vision was to make the sky productive and accessible to everyone. They wanted to remove complexity in order to allow companies to operate fleets of drones safely, reliably, and simply. They also wanted to give their customers the ability to process the data collected by the drones.

They launched the first version of their code in 2014. Since then they have attracted customers in industries as diverse as construction, agriculture, surveying, and mining (many interesting stories can be found on the DroneDeploy Blog). Here are a few examples:

A customer in Mexico processed 1000 km of road imagery in just 3 weeks (114,043 images / 8 terabytes of data).
A potato farmer in North Dakota mapped a 150 acre field, processed the data (30 minutes), and evaluated crop damage.
A construction manager in Oklahoma used DroneDeploy to monitor the construction of oil tanks and pipelines, producing 3D models in the process.

DroneDeploy is processing images from 100 countries into interactive maps and 3D models. They host their core infrastructure on AWS. They make heavy use of EC2 for image processing and S3 for storage (multiple petabytes). The image processing fleet is auto scaled up and down based on the number and priority of jobs, spread out across multiple Availability Zones.

— Jeff;

Using Enhanced RDS Monitoring with Datadog

by Jeff Barr | on 09 MAR 2016 | in AWS Lambda, CloudWatch, Guest Post | Permalink | Comments

Today’s guest post comes from K Young, Director of Strategic Initiatives at Datadog!

— Jeff;

AWS recently announced enhanced monitoring for Amazon RDS instances running MySQL, MariaDB, and Aurora. Enhanced monitoring includes over 50 new CPU, memory, file system, and disk I/O metrics which can be collected on a per-instance basis as frequently as once per second.

AWS and Datadog
AWS worked closely with Datadog to help customers send this new high-resolution data to Datadog for monitoring. Datadog is an infrastructure monitoring platform that is very popular with AWS customers—you can see historical trends with full granularity and also visualize and alert on live data from any part of your stack.

With a few minutes of work your enhanced RDS metrics will immediately begin populating a pre-built, customizable dashboard in Datadog:

Connect RDS and Datadog
The first step is to send enhanced RDS metrics to CloudWatch Logs. You can enable the metrics during instance creation, or on an existing RDS instance by selecting it in the RDS Console and then choosing Instance Options → Modify:

Set Granularity to 1–60 seconds; every 15 seconds is often a good choice. Once enabled, enhanced metrics will be sent to CloudWatch Logs.

The second step is to send the CloudWatch Log data to Datadog. Begin by setting up a Lambda function to process the logs and send the metrics:

Create a role for your Lambda function. Name it something like lambda-datadog-enhanced-rds-collector and select AWS Lambda as the role type.
From the Encryption Keys tab on the IAM Management Console, create a new encryption key. Enter an Alias for the key like lambda-datadog-key. On the next page, add the appropriate administrators for the key. Next you’ll be prompted to add users to the key. Add at least two: yourself (so that you can encrypt the Datadog API key from the AWS CLI in the next step), and the role created above, e.g. lambda-datadog-enhanced-rds-collector (so that it can decrypt the API key and submit metrics to Datadog). Finish creating the key.
Encrypt the token using the AWS Command Line Interface (CLI), providing the Alias of your just-created key (e.g. lambda-datadog-key) as well as your Datadog keys, available here. Use KMS to encrypt your key, like this:
```
$ aws kms encrypt --key-id alias/ALIAS_KEY_NAME --plaintext '{"api_key":"DATADOG_API_KEY", "app_key":"DATADOG_APP_KEY"}'
```
Save the output of this command; you will need it for the next step.
From the Lambda Management Console, create a new Lambda Function. Filter blueprints by datadog, and select the datadog-process-rds-metrics blueprint.
Choose RDSOSMetrics from the Log Group dropdown, enter the Filter Name of your choice, and go to the next page. If you have not yet enabled enhanced monitoring, you must do so before RDSOSMetrics will be presented an as option (see the instructions under Connect RDS and Datadog above):
Give your function a name like send-enhanced-rds-to-datadog. In the Lambda function code area, replace the string after KMS_ENCRYPTED_KEYS with the ciphertext blob part of the CLI command output above.
Under Lambda function handler and role, choose the role you created in step 2, e.g. lambda-datadog-enhanced-rds-collector. Go to the next page, select the Enable Now radio button, and create your function.

That’s It
Once you have enabled RDS in Datadog’s AWS integration tile, Datadog will immediately begin displaying your enhanced RDS metrics. Your RDS instances will be individually identifiable in Datadog via automatically-created tags of the form dbinstanceidentifier:YOUR_DB_INSTANCE_NAME, as well as any tags you added through the RDS console.

You can clone the pre-built dashboard and customize it however you want: add RDS metrics that are not displayed by default, or start correlating RDS metrics with the performance of the rest of your stack.

— K Young, Director of Strategic Initiatives

Feb	MAR	Apr
	17
2015	2016	2017

AWS Official Blog

Connect with AWS

AWS Blogs

RSS Feed

Brought to you by