AWS Blog

New – Cross-Region Read Replicas for Amazon Aurora

by Jeff Barr | on | in Amazon Aurora, Amazon RDS | | Comments

You already have the power to scale the read capacity of your Amazon Aurora instances by adding additional read replicas to an existing cluster. Today we are giving you the power to create a read replica in another region. This new feature will allow you to support cross-region disaster recovery and to scale out reads. You can also use it to migrate from one region to another or to create a new database environment in a different region.

Creating a read replica in another region also creates an Aurora cluster in the region. This cluster can contain up to 15 more read replicas, with very low replication lag (typically less than 20 ms) within the region (between regions, latency will vary based on the distance between the source and target). You can use this model to duplicate your cluster and read replica setup across regions for disaster recovery. In the event of a regional disruption, you can promote the cross-region replica to be the master. This will allow you to minimize downtime for your cross-region application. This feature applies to unencrypted Aurora clusters.

Before you get actually create the read replica, you need to take care of a pair of prerequisites. You need to make sure that a VPC and the Database Subnet Groups exist in the target region, and you need to enable binary logging on the existing cluster.

Setting up the VPC
Because Aurora always runs within a VPC, ensure that the VPC and the desired Database Subnet Groups exist in the target region. Here are mine:

Enabling Binary Logging
Before you can create a cross region read replica, you need to enable binary logging for your existing cluster. Create a new DB Cluster Parameter Group (if you are not already using a non-default one):

Enable binary logging (choose MIXED) and then click on Save Changes:

Next, Modify the DB Instance, select the new DB Cluster Parameter Group, check Apply Immediately, and click on Continue. Confirm your modifications, and then click on Modify DB Instance to proceed:

Select the instance and reboot it, then wait until it is ready.

Create Read Replica
With the prerequisites out of the way it is time to create the read replica! From within the AWS Management Console, select the source cluster and choose Create Cross Region Read Replica from the Instance Actions menu:

Name the new cluster and the new instance, and then pick the target region. Choose the DB Subnet Group and set the other options as desired, then click Create:

Aurora will create the cluster and the instance. The state of both items will remain at creating until the items have been created and the data has been replicated (this could take some time, depending on amount of data stored in the existing cluster.

This feature is available now and you can start using it today!

Jeff;

 

New in AWS Marketplace: Alces Flight – Effortless HPC on Demand

by Jeff Barr | on | in AWS Marketplace, HPC | | Comments

In the past couple of years, academic and corporate researchers have begun to see the value of the cloud. Faced with a need to run demanding jobs and to deliver meaningful results as quickly as possible while keeping costs under control, they are now using AWS to run a wide variety of compute-intensive, highly parallel workloads.

Instead of fighting for time on a cluster that must be shared with other researchers, they accelerate their work by launching clusters on demand, running their jobs, and then shutting the cluster down shortly thereafter, paying only for the resources that they consume. They replace tedious RFPs, procurement, hardware builds and acceptance testing with cloud resources that they can launch in minutes. As their needs grow, they can scale the existing cluster or launch a new one.

This self-serve, cloud-based approach favors science over servers and accelerates the pace of research and innovation. Access to shared, cloud-based resources can be granted to colleagues located on the same campus or halfway around the world, without having to worry about potential issues at organizational or network boundaries.

Alces Flight in AWS Marketplace
Today we are making Alces Flight available in AWS Marketplace. This is a fully-featured HPC environment that you can launch in a matter of minutes. It can make use of On-Demand or Spot Instances and comes complete with a job scheduler and hundreds of HPC applications that are all set up and ready to run. Some of the applications include built-in collaborative features such as shared graphical views. For example, here’s the Integrative Genomics Viewer (IGV):

Each cluster is launched into a Virtual Private Cloud (VPC) with SSH and graphical desktop connectivity. Clusters can be of fixed size, or can be Auto Scaled in order to meet changes in demand.  Once launched, the cluster looks and behaves just like a traditional Linux-powered HPC cluster, with shared NFS storage and passwordless SSH access to the compute nodes. It includes access to HPC applications, libraries, tools, and MPI suites.

We are launching Alces Flight in AWS Marketplace today. You can launch a small cluster (up to 8 nodes) for evaluation and testing or a larger cluster for research.

If you subscribe to the product, you can download the AWS CloudFormation template from the Alces site. This template powers all of the products, and is used to quickly launch all of the AWS resources needed to create the cluster.

EC2 Spot Instances give you access to spare AWS capacity at up to a 90% discount from On-Demand pricing and can significantly reduce your cost per core. You simply enter the maximum bid price that you are willing to pay for a single compute node; AWS will manage your bid, running the nodes when capacity is available at the desired price point.

Running Alces Flight
In order to get some first-hand experience with Alces Flight, I launched a cluster of my own. Here are the settings that I used:

I set a tag for all of the resources in the stack as follows:

I confirmed my choices and gave CloudFormation the go-ahead to create my cluster. As expected, the cluster was all set up and ready to go within 5 minutes. Here are some of the events that were logged along the way:

Then I SSH’ed in to the login node and saw the greeting, all as expected:

After I launched my cluster I realized that this post would be more interesting if I had more compute nodes in my cluster. Instead of starting over, I simply modified my CloudFormation stack to have 4 nodes instead of 1, applied the change, and watched as the new nodes came online. Since I specified the use of Spot Instances when I launched the cluster, Auto Scaling placed bids automatically. Once the nodes were online I was able to locate them from within my PuTTY session:

Then I used the pdsh (Parallel Distributed Shell command) to check on the up-time of each compute node:

Learn More
This barely counts as scratching the surface; read Getting Started as Quickly as Possible to learn a lot more about what you can do! You should also watch one or more of the Alces videos to see this cool new product in action.

If you are building and running data-intensive HPC applications on AWS, you may also be interested in another Marketplace offering. The BeeGFS (self-supported or support included) parallel file system runs across multiple EC2 instances, aggregating the processing  power into a single namespace, with all data stored on EBS volumes.  The self-supported product is also available on a 14 day free trial. You can create a cluster file system using BeeGFS and then use it as part of your Alces cluster.

Jeff;

 

Amazon ElastiCache Update – Export Redis Snapshots to Amazon S3

by Jeff Barr | on | in Amazon ElastiCache, Amazon S3 | | Comments

Amazon ElastiCache supports the popular Memcached and Redis in-memory caching engines. While Memcached is generally used to cache results from a slower, disk-based database, Redis is used as a fast, persistent key-value store. It uses replicas and failover to support high availability, and natively supports the use of structured values.

Today I am going to focus on a helpful new feature that will be of interest to Redis users. You already have the ability to create snapshots of a running Cache Cluster. These snapshots serve as a persistent backup, and can be used to create a new Cache Cluster that is already loaded with data and ready to go. As a reminder, here’s how you create a snapshot of a Cache Cluster:

You can now export your Redis snapshots to an S3 bucket. The bucket must be in the same Region as the snapshot and you need to grant ElastiCache the proper permissions (List, Upload/Delete, and View Permissions) on it. We envision several uses for this feature:

Disaster Recovery – You can copy the snapshot to another environment for safekeeping.

Analysis – You can dissect and analyze the snapshot in order to understand usage patterns.

Seeding – You can use the snapshot to seed a fresh Redis Cache Cluster in another Region.

Exporting a Snapshot
To export a snapshot, simply locate it, select it, and click on Copy Snapshot:

Verify the permissions on the bucket (read Exporting Your Snapshot to learn more):

Then enter a name and select the desired bucket:

ElastiCache will export the snapshot and it will appear in the bucket:

The file is a standard Redis RDB file, and can be used as such.

You can also exercise this same functionality from your own code or via the command line. Your code can call CopySnapshot while specifying the target S3 bucket. Your scripts can use the  copy-snapshot command.

This feature is available now and you can start using it today! There’s no charge for the export; you’ll pay the usual S3 storage charges.

Jeff;

 

Amazon Elastic Transcoder Update – Support for MPEG-DASH

by Jeff Barr | on | in Amazon Elastic Transcoder | | Comments

Amazon Elastic Transcoder converts media files (audio and video) from one format to another. The service is robust, scalable, cost-effective, and easy to use. You simply create a processing pipeline (pointing to a pair of S3 buckets for input and output in the process), and then create transcoding jobs. Each job reads a specific file from the input bucket, transcodes it to the desired format(s) as specified in the job, and then writes the output to the output bucket. You pay for only what you transcode, with price points for Standard Definition (SD) video, High Definition (HD) video, and audio. We launched the service with support for an initial set of transcoding presets (combinations of output formats and relevant settings). Over time, in response to customer demand and changes in encoding technologies, we have added additional presets and formats. For example, we added support for the VP9 Codec earlier this year.

Support for MPEG-DASH
Today we are adding support for transcoding to the MPEG-DASH format. This International Standard format supports high-quality audio and video streaming from HTTP servers, and has the ability to adapt to changes in available network throughput using a technique known as adaptive streaming. It was designed to work well across multiple platforms and at multiple bitrates, simplifying the transcoding process and sidestepping the need to create output in multiple formats.

During the MPEG-DASH transcoding process, the content is transcoded into segmented outputs at the different bitrates and a playlist is created that references these outputs. The client (most often a video player) downloads the playlist to initiate playback. Then it monitors the effective network bandwidth and latency, requests video segments as needed. If network conditions change during the playback process, the player will take action, upshifting or downshifting as needed.

You can serve up the transcoded content directly from S3 or you can use Amazon CloudFront to get the content even closer to your users. Either way, you need to create a CORS policy that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
    <CORSRule>
        <AllowedOrigin>*</AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

If you are using CloudFront, you need to enable the OPTIONS method, and allow it to be cached:

You also need to add three headers to the whitelist for the distribution:

Transcoding With MPEG-DASH
To make use of the adaptive bitrate feature of MPEG-DASH, you create a single transcoding job and specify multiple outputs, each with a different preset. Here are your choices (4 for video and 1 for audio):

When you use this format, you also need to choose a suitable segment duration (in seconds). A shorter duration produces a larger number of smaller segments and allows the client to adapt to changes more quickly.

You can create a single playlist that contains all of the bitrates, or you can choose the bitrates that are most appropriate for your customers and your content. You can also create your own presets, using an existing one as a starting point:

Available Now
MPEG-DASH support is available now in all Regions where Amazon Elastic Transcoder is available. There is no extra charge for this use of this format (see Elastic Transcoder Pricing to learn more).

Jeff;

 

Amazon Redshift – Up to 2X Throughput and 10X Vacuuming Performance Improvements

by Jeff Barr | on | in Amazon Redshift | | Comments

My colleague Maor Kleider wrote today’s guest post!

Jeff;

Amazon Redshift, AWS’s fully managed data warehouse service, makes petabyte-scale data analysis fast, cheap, and simple. Since launch, it has been one of AWS’s fastest growing services, with many thousands of customers across many industries. Enterprises such as NTT DOCOMO, NASDAQ, FINRA, Johnson & Johnson, Hearst, Amgen, and web-scale companies such as Yelp, Foursquare and Yahoo! have made Amazon Redshift a key component of their analytics infrastructure.

In this blog post, we look at performance improvements we’ve made over the last several months to Amazon Redshift, improving throughput by more than 2X and vacuuming performance by 10X.

Column Store
Large scale data warehousing is largely an I/O problem, and Amazon Redshift uses a distributed columnar architecture to minimize and parallelize I/O. In a column-store, each column of a table is stored in its own data block. This reduces data size, since we can choose compression algorithms optimized for each type of column. It also reduces I/O time during queries, because only the columns in the table that are being selected need to be retrieved.

However, while a column-store is very efficient at reading data, it is less efficient than a row-store at loading and committing data, particularly for small data sets. In patch 1.0.1012 (December 17, 2015), we released a significant improvement to our I/O and commit logic. This helped with small data loads and queries using temporary tables. While the improvements are workload-dependent, we estimate the typical customer saw a 35% improvement in overall throughput.

Regarding this feature, Naeem Ali, Director of Software Development, Data Science at Cablevision, told us:

Following the release of the I/O and commit logic enhancement, we saw a 2X performance improvement on a wide variety of workloads. The more complex the queries, the higher the performance improvement.

Improved Query Processing
In addition to enhancing the I/O and commit logic for Amazon Redshift, we released an improvement to the memory allocation for query processing in patch 1.0.1056 (May 17, 2016), increasing overall throughput by up to 60% (as measured on standard benchmarks TPC-DS, 3TB), depending on the workload and the number of queries that spill from memory to disk. The query throughput improvement increases with the number of concurrent queries, as less data is spilled from memory to disk, reducing required I/O.

Taken together, these two improvements, should double performance for customer workloads where a portion of the workload contains complex queries that spill to disk or cause temporary tables to be created.

Better Vacuuming
Amazon Redshift uses multi-version concurrency control to reduce contention between readers and writers to a table. Like PostgreSQL, it does this by marking old versions of data as deleted and new versions as inserted, using the transaction ID as a marker. This allows readers to build a snapshot of the data they are allowed to see and traverse the table without locking. One issue with this approach is the system becomes slower over time, requiring a vacuum command to reclaim the space. This command reclaims the space from deleted rows and ensures new data that has been added to the table is placed in the right sorted order.

We are releasing a significant performance improvement to vacuum in patch 1.0.1056, available starting May 17, 2016. Customers previewing the feature have seen dramatic improvements both in vacuum performance and overall system throughput as vacuum requires less resources.

Ari Miller, a Principal Software Engineer at TripAdvisor, told me:

We estimate that the vacuum operation on a 15TB table went about 10X faster with the recent patch, ultimately improving overall query performance.

 You can query the VERSION function to verify that you are running at the desired patch level.

Available Now
Unlike on-premise data warehousing solutions, there are no license or maintenance fees for these improvements or work required on your part to obtain them. They simply show up as part of the automated patching process during your maintenance window.

Maor Kleider, Senior Product Manager, Amazon Redshift

 

EC2 Instance Console Screenshot

by Jeff Barr | on | in Amazon EC2 | | Comments

When our users move existing machine images to the cloud for use on Amazon EC2, they occasionally encounter issues with drivers, boot parameters, system configuration settings, and in-progress software updates. These issues can cause the instance to become unreachable via RDP (for Windows) or SSH (for Linux) and can be difficult to diagnose. On a traditional system, the physical console often contains log messages or other clues that can be used to identify and understand what’s going on.

In order to provide you with additional visibility into the state of your instances, we now offer the ability to generate and capture screenshots of the instance console. You can generate screenshots while the instance is running or after it has crashed.

Here’s how you generate a screenshot from the console (the instance must be using HVM virtualization):

And here’s the result:

It can also be used for Windows instances:

You can also create screenshots using the CLI (aws ec2 get-console-screenshot) or the EC2 API (GetConsoleScreenshot).

Available Now
This feature is available today in the US East (Northern Virginia), US West (Oregon), US West (Northern California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), and South America (Brazil) Regions. There are no costs associated with it.

Jeff;

 

New AWS Quick Start Reference Deployment – Standardized Architecture for PCI DSS

by Jeff Barr | on | in Quick Start | | Comments

If you build an application that processes credit card data, you need to conform to PCI DSS (Payment Card Industry Data Security Standard). Adherence to the standard means that you need to meet control objectives for your network, protect cardholder data, implement strong access controls, and more.

In order to help AWS customers to build systems that conform to PCI DSS, we are releasing a new Quick Start Reference Deployment. The new Standardized Architecture for PCI DSS on the AWS Cloud (PDF or HTML) includes a AWS CloudFormation template that deploys a standardized environment that falls in scope for PCI DSS compliance (version 3.1).

The template describes a stack that deploys a multi-tiered Linux-based web application in about 30 minutes. It makes use of child templates, and can be customized as desired. It launches a pair of Virtual Private Clouds (Management and Production) and can accommodate a third VPC for development:

The template sets up the IAM items (policies, groups, roles, and instance profiles), S3 buckets (encrypted web content, logging, and backup), a Bastion host for troubleshooting and administration, an encrypted RDS database instance running in multiple Availability Zones, and a logging / monitoring / alerting package that makes use of AWS CloudTrail, Amazon CloudWatch, and AWS Config Rules. The architecture supports a wide variety of AWS best practices (all of which are detailed in the document) including use of multiple Availability Zones, isolation using public and private subnets, load balancing, auto scaling, and more.

You can use the template to set up an environment that you can use for learning, as a prototype, or as the basis for your own template.

The Quick Start also includes a Security Controls Reference. This document maps the security controls called out by PCI DSS to the relevant architecture decisions, features, and configurations.

Jeff;

PS – Check out our other AWS Enterprise Accelerator Quick Starts!

 

 

Arduino Web Editor and Cloud Platform – Powered by AWS

by Jeff Barr | on | in Announcements, AWS Lambda, Internet of Things | | Comments

Last night I spoke with Luca Cipriani from Arduino to learn more about the new AWS-powered Arduino Web Editor and Arduino Cloud Platform offerings. Luca was en-route to the Bay Area Maker Faire and we had just a few minutes to speak, but that was enough time for me to learn a bit about what they have built.

If you have ever used an Arduino, you know that there are several steps involved. First you need to connect the board to your PC’s serial port using a special cable (you can also use Wi-Fi if you have the appropriate add-on “shield”), ensure that the port is properly configured, and establish basic communication. Then you need to install, configure, and launch your development environment, make sure that it can talk to your Arduino, tell it which make and model of Arduino that you are using, and select the libraries that you want to call from your code. With all of that taken care of, you are ready to write code, compile it, and then download it to the board for debugging and testing.

Arduino Code Editor
Luca told me that the Arduino Code Editor was designed to simplify and streamline the setup and development process. The editor runs within your browser and is hosted on AWS (although we did not have time to get in to the details, I understand that they made good use of AWS Lambda and several other AWS services).

You can write and modify your code, save it to the cloud and optionally share it with your colleagues and/or friends. The editor can also detect your board (using a small native plugin) and configure itself accordingly; it even makes sure that you can only write code using libraries that are compatible with your board. All of your code is compiled in the cloud and then downloaded to your board for execution.

Here’s what the editor looks like (see Sneak Peek on the New, Web-Based Arduino Create for more):

Arduino Cloud Platform
Because Arduinos are small, easy to program, and consume very little power, they work well in IoT (Internet of Things) applications. Even better, it is easy to connect them to all sorts of sensors, displays, and actuators so that they can collect data and effect changes.

The new Arduino Cloud Platform is designed to simplify the task of building IoT applications that make use of Arduino technology. Connected devices will be able to be able to connect to the Internet, upload information derived from sensors, and effect changes upon command from the cloud. Building upon the functionality provided by AWS IoT, this new platform will allow devices to communicate with the Internet and with each other. While the final details are still under wraps, I believe that this will pave the wave for sensors to activate Lambda functions and for Lambda functions to take control of displays and actuators.

I look forward to learning more about this platform as the details become available!

Jeff;

 

AWS Accelerator for Citrix – Migrate or Deploy XenApp & XenDesktop to the Cloud

by Jeff Barr | on | | Comments

If you are running Citrix XenApp, XenDesktop and/or NetScaler on-premises  and are interested in moving to the AWS Cloud, I have a really interesting offer for you!

In cooperation with our friends at Citrix (an Advanced APN Technology Partner), we have assembled an AWS Accelerator to help you to plan and execute a successful trial migration while using your existing licenses. The migration process makes use of the new Citrix Lifecycle Management (CLM) tool. CLM includes a set of proven migration blueprints that will help you to move your existing deployment to AWS. You can also deploy the XenApp and XenDesktop Service using Citrix Cloud, and tap CLM to manage your AWS-based resources.

Here’s the Deal
The AWS Accelerator lets you conduct a 25-user trial migration / proof of concept over a 60 day period. During that time you can use CLM to deploy XenApp, XenDesktop, and NetScaler on AWS per the reference architecture and a set of best practices. We will provide you with AWS Credit ($5000) and Citrix will provide you with access to CLM. A select group of joint AWS and Citrix launch partners will deliver the trials with the backing and support of technical and services teams from both companies.

Getting Started
Here’s what you need to do to get started:

  1. Contact your AWS (email us) or Citrix account team and ask to join the AWS Accelerator.
  2. Submit your request in order to be considered for Amazon EC2 credits and a trial of Citrix CLM.
  3. Create an AWS account if you don’t already have one.

After you do this, follow the steps in the Citrix blueprint (Deploy the XenApp and XenDesktop Proof of Concept blueprint with NetScaler to AWS) to build your proof-of-concept environment.

Multiple AWS Partners are ready, willing, and able to help you to work through the blueprint and to help you to tailor it to the needs of your organization. The AWS Accelerator Launch Services Partners include Accenture, Booz Allen Hamilton, CloudNation, Cloudreach, Connectria, Equinix (EPS Cloud), REAN Cloud, and SSI-Net. Our Launch Direct Connect partner is Level 3.

Learn More at Synergy
AWS will be sponsoring Citrix Synergy next week in Las Vegas and will be at booth #770. Citrix will also be teaching a hands on lab (SYN618) based on the AWS Accelerator program on Monday May 23rd at 8 AM. If you are interested in learning more please sign up for the hands on lab or stop by the booth and say hello to my colleagues!

Jeff;

 

New – Cross-Account Snapshot Sharing for Amazon Aurora

by Jeff Barr | on | in Amazon Aurora | | Comments

Amazon Aurora is a high-performance, MySQL-compatible database engine. Aurora combines the speed and availability of high-end commercial databases with the simplicity and cost-effective of open source databases (see my post, Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon RDS, to learn more). Aurora shares some important attributes with the other database engines that are available for Amazon RDS including easy administration, push-button scalability, speed, security, and cost-effectiveness.

You can create a snapshot backup of an Aurora cluster with just a couple of clicks. After you have created a snapshot, you can use it to restore your database, once again with a couple of clicks.

Share Snapshots
Today we are giving you the ability to share your Aurora snapshots. You can share them with other AWS accounts and you can also make them public. These snapshots can be used to restore the database to an Aurora instance running in a separate AWS account in the same Region as the snapshot.

There are several primary use cases for snapshot sharing:

Separation of Environments – Many AWS customers use separate AWS accounts for their development, test, staging, and production environments. You can share snapshots between these accounts as needed. For example, you can generate the initial database in your staging environment, snapshot it, share the snapshot with your production account, and then use it to create your production database. Or, should you encounter an issue with your production code or queries, you can create a snapshot of your production database and then share it with your test account for debugging and remediation.

Partnering – You can share database snapshots with selected partners on an as-needed basis.

Data Dissemination -If you are running a research project, you can generate snapshots and then share them publicly. Interested parties can then create their own Aurora databases using the snapshots, using your work and your data as a starting point.

To share a snapshot, simply select it in the RDS Console and click on Share Snapshot. Then enter the target AWS account (or click on Public to share the snapshot publicly) and click on Add:

You can share manually generated, unencrypted snapshots with other AWS accounts or publicly. You cannot share automatic snapshots or encrypted snapshots.

The shared snapshot becomes visible in the other account right away:

Public snapshots are also visible (select All Public Snapshots as the Filter):

Available Now
This feature is available now and you can start using it today.

Jeff;