AWS Blog
Box Zones – Giving Enterprises Control Over Data Location Using AWS
Our friends over at Box provide secure content management, collaboration, and file sharing for over half of the companies on the Fortune 500 list.
Box has succeeded by paying attention to the needs of enterprise customers. For example, last year I wrote about Box Enterprise Key Management (EKM), a flexible, no-compromises encryption system that gives Box customers control over their encryption keys. This feature has evolved into Box KeySafe, which allows even the smallest IT shops use encryption to protect their proprietary documents. Other Box features such as Box Capture (mobile phone integration with business processing) and Box Governance (control and compliance) are also purpose-built to meet the business needs of enterprises.
We are happy to play a strong supporting role in today’s launch of Box Zones. This new feature uses Amazon S3 to provide Box customers with a choice of four storage locations (Germany, Ireland, Singapore, and Tokyo). Box customers can decide where to store their data while still taking advantage of other Box features such as watermarking, fine-grained control over permissions, commenting, and file preview.
To learn more about this new feature, sign up for the Box Zones webinar.
Congratulations to Box, and thank you for using AWS!
— Jeff;
AWS Week in Review – April 4, 2016
Let’s take a quick look at what happened in AWS-land last week:
Upcoming Events
- AWS Partner Webinars – April.
- April 29 – Live Event (Singapore) – AWS Partner Summit.
- AWS Zombie Microservices Roadshow.
Help Wanted
Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.
— Jeff;
AWS Week in Review – March 28, 2016
Let’s take a quick look at what happened in AWS-land last week:
Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.
— Jeff;
AWS Week in Review – March 21, 2016
Let’s take a quick look at what happened in AWS-land last week:
Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.
— Jeff;
Amazon Kinesis Agent Update – New Data Preprocessing Features
My colleague Ray Zhu wrote the guest post below to introduce you to some new data preprocessing features for the Amazon Kinesis Agent.
— Jeff;Amazon Kinesis Agent is a stand-alone Java software application that provides an easy and reliable way to send data to Amazon Kinesis Streams and Amazon Kinesis Firehose. The agent monitors a set of files for new data and then sends it to Kinesis Streams or Kinesis Firehose continuously. It handles file rotation, checkpointing, and retrial upon failures. It also supports Amazon CloudWatch so that you can closely monitor and troubleshoot the data flow from the agent.
Data Preprocessing with Kinesis Agent
Today we are adding data preprocessing capabilities to the agent so that your data can be well formatted before it is sent to Kinesis Streams or Kinesis Firehose. The agent currently supports the three processing options listed below. Because the agent is open source, you can further develop and extend these processing options.
SINGLELINE – This option converts a multi-line record to a single line record by removing newline characters, and leading and trailing spaces.
CSVTOJSON – This option converts a record from delimiter separated format to JSON format.
LOGTOJSON – This option converts a record from several commonly used log formats to JSON format. Currently supported log formats are Apache Common Log, Apache Combined Log, Apache Error Log, and RFC3164 (syslog).
Analyze Apache Tomcat Access Log in Near Real-Time
Let’s look at an example of analyzing Tomcat access logs in near real-time using Kinesis Agent’s preprocessing feature, Amazon Kinesis Firehose, and Amazon Redshift. Here’s the overall flow:

First I need to create a table in my Redshift cluster to store the Tomcat access log. The following SQL statement is used to create the table:
CREATE TABLE logs(
host VARCHAR(40),
ident VARCHAR(25),
authuser VARCHAR(25),
datetime VARCHAR(60),
request VARCHAR(2048),
response SMALLINT NOT NULL,
bytes INTEGER,
referer VARCHAR(2048),
agent VARCHAR(256));
Then I need to create a Kinesis Firehose delivery stream that continuously delivers data to the Redshift table created above:

Now I’ve set up my Redshift table and Firehose delivery stream. Next I need to install the Kinesis Agent on my Tomcat server to monitor my Tomcat access log files and continuously send the log data to my delivery stream. Here is a screenshot of the raw Tomcat access log:

In the agent configuration, I use the LOGTOJSON processing option to convert raw Tomcat access log data to JSON format before sending the data to my delivery stream. Here’s how I set that up:
{
"cloudwatch.emitMetrics":true,
"flows":[
{
"filePattern":"/data/access.log*",
"deliveryStream":"access_log_stream",
"initialPosition":"START_OF_FILE",
"dataProcessingOptions":[
{
"optionName":"LOGTOJSON",
"logFormat":"COMBINEDAPACHELOG"
}
]
}
]
}
Everything is set up now and let’s start the agent! After a minute or two, my Tomcat access log data shows up in my S3 bucket and Redshift table. Here is how the data looks like in my S3 bucket. Notice that the raw log data has been nicely formatted as JSON:

Here is how the data looks like in my Redshift table:

I can run SQL queries to analyze my Tomcat access log, or use the Business Intelligence tool of my choice to visualize the data:

It took me less than an hour to set up the whole data pipeline. Now I can analyze and visualize access log data using my favorite Business Intelligence tool, only minutes after the data is generated on my Tomcat server!
Available Now
Kinesis Agent’s data preprocessing feature is available now and you can start using it today – visit the Amazon Kinesis Agent Repository! To learn more, read Use Agent to Preprocess Data in the Kinesis Firehose Developer Guide.
— Ray Zhu, Senior Product Manager
AWS Training Update – Revised AWS Technical Essentials and Architecting on AWS Courses
We continuously enhance our technical courses to stay current with the pace of AWS platform updates and incorporate student feedback. We have made substantial updates to our two most popular foundational training courses, AWS Technical Essentials and Architecting on AWS, to better provide students with actionable knowledge to get started creating solutions with AWS and a path to advanced learning.
AWS Technical Essentials – What’s New
This is a one-day course for solutions architects, developers, sysops administrators, and anyone who wants to get started using AWS. It covers the foundations of cloud computing, storage, and networking. It’s also used as the content for AWSome Days. The updated course now addresses 18 AWS services, with in-depth coverage of 10 core services: EC2, S3, EBS, IAM, Auto Scaling, ELB, RDS, DynamoDB, Auto Scaling, and CloudWatch. New, comprehensive hands-on lab exercises and instructor-led demonstrations help students learn how to get started creating real-world solutions on the AWS platform. The updated course also provides students with a clearer path to continue their education with more advanced courses such as Architecting on AWS and Systems Operations on AWS. You can read the course description to learn more.
Architecting on AWS – What’s New
This is a three-day course for solutions architects and solution design engineers. It aligns with the changes to AWS Technical Essentials, making the concepts learned in that course a prerequisite. The updated course now focuses on cloud best practices, architecture patterns, case studies, and other practical ways of thinking about how to architect infrastructure on AWS. Hands-on lab exercises walk you through how to build complete application environments on AWS using a variety of AWS services, including Amazon VPC, Amazon EC2, Amazon S3, AWS Lambda, and more. New content also addresses automating and de-coupling infrastructures using architectures less dependent on servers, troubleshooting commonly misconfigured architectures, and concepts from the Well-Architected Framework. Read the course description to learn more.
Accessing the Courses
These classes (and many more) are available through AWS and our Training Partners. You can find upcoming classes in our global training schedule or learn more at AWS Training.
LiveOps Cloud – Tapping the Billion Dollar Call-Center Market on AWS
LiveOps Cloud is ready to break open a huge untapped market. The company is a long-time solutions provider for the contact center industry, and just recently launched CxEngage, a new contact center-as-a-service platform built and run entirely on AWS. I asked Jeff Thompson, LiveOps Cloud’s CTO and SVP for engineering, to tell us a bit about their decision to launch this great new service.
— Jeff;
We like to say that LiveOps Cloud is a 16-year-old startup. We’re a new company carved out of LiveOps Inc., and our mission is to take the original company’s long history of providing contact center solutions into a new era of cloud-first convenience, performance, and lower costs.
The contact center business is huge, with estimates of at least 15 million seats worldwide that comprise a multi-billion-dollar market. But the industry is a late-comer to cloud computing, with only about 10 percent of contact center operations working in some capacity with cloud infrastructure and tools. So there still are a lot of legacy, on-premises call center systems in place—especially in traditional industries like banking and retail—that are quickly reaching their expiration date. These systems are inadequate for meeting the demands of today’s market, with companies having to hold down costs, provide ever-better performance and sophistication, and serve emerging markets.
Cloud Bake-Off
Our plan was to create a pure cloud contact center-as-a-service (CCaaS) that could deliver an always-on, secure, multi-tenant, and instantly scalable platform so businesses can deliver exceptional customer experiences anywhere, anytime. We anticipated that if done right, our CCaaS would take off in in a true ‘hockey stick’ growth pattern. To get there, we needed to carefully consider what platform would make the most sense.
We held a bake-off that looked at a number of alternatives. Azure, Rackspace, Google, and AWS were in the cloud provider mix, and we also looked at using a colocated facility. That last one was the first to go. We knew from experience that running the platform out of a co-lo would simply not provide the redundancy and scalability we wanted to bake into the new platform. We ruled out Azure because we’re not a Microsoft shop, and were using a lot on non-Windows tools and Linux to create the platform. Rackspace has good IaaS, but their global reach was insufficient for our business goals. We also ruled out Google because they didn’t have the breadth of apps we felt were required to build our platform.
A Clear Winner
In our view, AWS was the clear winner. It delivers all the features and benefits we were seeking. It has an incredibly rich catalog of services, with new ones being released at a pace that competing cloud providers simply are not matching. We might not need them all now, but knowing those services are there, and that AWS is innovating and adding to them all the time, instills real confidence. We know that if we have some need or feature request in the future, chances are AWS already has a service that can address it. Good examples—and just a small portion of the AWS services that we use—are Amazon Redshift and Amazon Kinesis, two powerful data services that are essential to our platform, and Amazon Simple Queue Service (SQS), which drives messaging out to agent toolbars.
AWS also has broad global reach, which is critical to the CxEngage business model. The North American and Western European markets are certainly an important source of revenue. We also see great opportunity in emerging call center markets in places like China, India, and the Asia-Pacific region. AWS operates in 12 regions around the world. That means we can provide services in close physical proximity to new customers, which boosts performance by reducing latency. When a call comes in, the businesses using our platform don’t want lag times in the system. And in some cases, it helps when there are sovereignty issues related to keeping data within particular boundaries.
AWS also provides major benefits in terms of flexibility and financial performance. For example, we can carefully plan for specific Amazon EC2 instance types to match the performance needs of particular services in the CxEngage platform. Some may require more I/O, some more memory. We can pick exactly what we need and not overprovision, which helps us not only optimize for performance, but also meet our financial goals. That, and the pay-as-you-go model of AWS, has made AWS very popular with our finance department.
Simple, with No Drama
AWS also makes it easier to build the business. For example, the built-in support for PCI and HIPAA, and the compliance and regulatory standards included with AWS GovCloud (US), help us quickly overcome potential barriers to signing new and important customers. We can check off those boxes and keep moving.
We started the journey of building the next-generation solution for call centers in 2014. We placed our bet on AWS, and 18 months later when we launched CxEngage, all of our financial and performance predictions for the platform were borne out. Everything we thought would happen by using AWS happened. It was simple, with no drama. We’re looking at AWS as a partner that is fundamental to our business, and to our growth plan.
— Jeff Thompson, CTO and SVP, LiveOps Cloud Platform
Building Bridges for Better Cancer Treatment with the Fred Hutchinson Cancer Research Center
My colleagues Jessica Beegle and Christopher Crosbie shared the inspiring story below!
— Jeff;The science of cancer research is continually evolving to include new fields of study. Examples include development of chemotherapies, radiology amplified treatments, and epidemiology for identifying carcinogens. Pathology continues to help deepen the understanding of the disease’s manifestations.
The discipline of computer science is a relatively new entrant in the quest to understand, treat and cure cancer. Computer science is needed to decipher how certain variations in our DNA relate to cancer and what treatment paths have the greatest potential for success for each individual. This type of task is best suited for computer science because the study of DNA, typically referred to as genomics, requires significant big data processing capabilities. In fact, scientists predict genomics will generate more digital information than astronomy, YouTube, and Twitter by 2025.
Today, much of the software developed to collect, analyze and visualize this data is created in silos among different IT systems, research departments, health care institutions, and even nations. This separation greatly hinders the speed of scientific discovery.
Researchers at Fred Hutchinson Cancer Research Center in Seattle wanted to change this. Led by Eric Holland, M.D., Ph.D., director of the Human Biology Division and Solid Tumor Translational Research at Fred Hutch, the team developed Oncoscape, an open-source web application to apply and develop analysis tools for molecular and clinical data. Oncoscape enables researchers to discover new patterns and relationships, which further cancer research.
To utilize current technology in computer science, the Oncoscape team collaborated with GitHub and AWS. The goal was to leverage the code-sharing platform that GitHub provides with the cloud computing capabilities that AWS offers. According to Dr. Holland:
Hosting Oncoscape in the cloud makes it easy for our development team to make changes and redeploy the software in order to keep up with the needs of the research community. Knowing I can securely access the site from anywhere in the world allows me to show collaborators what is possible with data visualization and how we can use a common platform to work together in cancer research.
Robert McDermott, the IT Solutions Architect behind the AWS deployment of Oncoscape explains: “AWS is a very capable, reliable and flexible platform that allows us to quickly adapt to the needs of the project.” He cites maturity, reliability, breadth and depth of services and security as the key drivers for using AWS.
Oncoscape uses several AWS services including Amazon EC2, Elastic Load Balancing, Amazon CloudWatch, and Amazon S3. This approach makes it easy to distribute traffic across physical locations (Availability Zones), as well as quickly obtain actionable notifications in the event of a site issue. Amazon Route 53 has also proven useful for quickly making modifications to the development environment.
The diagram below depicts the full Oncoscape integration and deployment pipeline, including the merger points between GitHub, Circleci, DockerHub, Slack, and AWS.

To learn more about the collaboration behind the Oncoscape project please watch this video or visit the Oncoscape home page.
— Jessica Beegle (Global Healthcare & Life Sciences Ecosystem Leader) and Christopher Crosbie (Healthcare and Life Science Solution Architect)
Experiment that Discovered the Higgs Boson Uses AWS to Probe Nature
My colleague Sanjay Padhi is part of the AWS Scientific Computing team. He wrote the guest post below to share the story of how AWS provided computational resources that aided in an important scientific discovery.
— Jeff;The Higgs boson (sometimes referred to as the God Particle), responsible for providing insight into the origin of mass, was discovered in 2012 by the world’s largest experiments, ATLAS and CMS, at the Large Hadron Collider (LHC) at CERN in Geneva, Switzerland. The theorists behind this discovery were awarded the 2013 Nobel Prize in Physics.
Deep underground on the border between France and Switzerland, the LHC is the world’s largest (17 miles in circumference) and highest-energy particle accelerator. It explores nature on smaller scales than any human invention has ever explored before.
From Experiment to Raw Data
The high energy particle collisions turn mass in to energy, which then turns back in to mass, creating new particles that are observed in the CMS detector. This detector is 69 feet long, 49 feet wide and 49 feet high, and sits in a cavern 328 feet underground near the village of Cessy in France. The raw data from the CMS is recorded every 25 nanoseconds at a rate of approximately 1 petabyte per second.
After online and offline processing of the raw data at the CERN Tier 0 data center, the datasets are distributed to 7 large Tier 1 data centers across the world within 48 hours, ready for further processing and analysis by scientists (the CMS collaboration, one of the largest in the world, consists of more than 3,000 participating members from over 180 institutes and universities in 43 countries).
Processing at Fermilab
Fermilab is one of 16 National Laboratories operated by the United States Department of Energy. Located just outside Batavia Illinois, Fermilab serves as one of the Tier 1 data centers for Cern’s CMS experiment.
With the increase in LHC collision energy last year, the demand for data assimilation, event simulations, and large-scale computing increased as well. With this increase came a desire to maximize cost efficiency by dynamically provisioning resources on an as-needed basis.
In order to address this issue, the Fermilab Scientific Computing Division launched the HEP (High Energy Physics) Cloud project in June of 2015. They planned to develop a virtual facility that would provide a common interface to access a variety of computing resources including commercial clouds. Using AWS, the HEP Cloud project successfully demonstrated the ability to add 58,000 cores elastically to their on-premises facility for the CMS experiment.
The image below depicts one of the simulations that was run on AWS. It shows how the collision of two protons creates energy that then becomes new particles.

The additional 58,000 cores represents a 4x increase in Fermilab’s computational capacity, all of which is dedicated to the CMS experiment in order to generate and reconstruct Monte Carlo simulation events. More than 500 million events were fully simulated in 10 days using 2.9 million jobs. Without help from AWS, this job would have taken 6 weeks to complete using the on-premises compute resources at Fermilab.
This simulation was done in preparation for one of the major high energy physics international conferences, Recontres de Moriond. Physicists across the world will use these simulations to probe nature in detail and will share their findings with their international colleagues during the conference.
Saving Money with HEP Cloud
The HEP Cloud project aims to minimize the costs of computation. The R&D and demonstration effort was supported by an award from the AWS Cloud Credit for Research.
HEP Cloud’s decision engine, the brain of the facility, has several duties. It oversees EC2 Spot Market price fluctuations using tools and techniques provided by Amazon’s Spot team, initializes Amazon EC2 instances using HTCondor, tracks the DNS names of the instances using Amazon Route 53 , and makes use of AWS CloudFormation templates for infrastructure as a code.
While on the road to success, the project team had to overcome several challenges, ranging from fine-tuning configurations to optimizing their use of Amazon S3 and other resources. For example, they devised a strategy to distribute the auxiliary data across multiple AWS Regions in order to minimize storage costs and data-access latency.
Automatic Scaling into AWS
The figure below shows elastic, automatic expansion of Fermilab’s Computing Facility into the AWS Cloud using Spot instances for CMS workflows. Monitoring of the resources was done using open source software provided by Grafana with custom modifications provided by the HEP Cloud.

Panagiotis Spentzouris (head of the Scientific Computing Division at Fermilab), told me:
Modern HEP experiments require massive computing resources in irregular cycles, so it is imperative for the success of our program that our computing facilities can rapidly expand and contract resources to match demand. Using commercial clouds is an important ingredient for achieving this goal, and our work with AWS on the CMS experiment’s workloads though HEPCloud was a great success in demonstrating the value of this approach.
I hope that you enjoyed this brief insight into the ways in which AWS is helping to explore the frontiers of physics!
— Sanjay Padhi, Ph.D, AWS Scientific Computing
New – Change Sets for AWS CloudFormation
AWS CloudFormation lets you create, manage, and update a collection of AWS resources (a “stack”) in a controlled, predictable manner. Every day, customers use CloudFormation to perform hundreds of thousands of updates to the stacks that support their production workloads. They define an initial template and then revise it as their requirements change.
This model, commonly known as infrastructure as code, gives developers, architects, and operations teams detailed control of the provisioning and configuration of their AWS resources. This detailed level of control and accountability is one of the most visible benefits that you get when you use CloudFormation. However, there are several others that are less visible but equally important:
Consistency – The CloudFormation team works with the AWS teams to make sure that newly added resource models have consistent semantics for creating, updating, and deleting resources. They take care to account for retries, idempotency, and management of related resources such as KMS keys for encrypting EBS or RDS volumes.
Stability – In any distributed system, issues related to eventual consistency often arise and must be dealt with. CloudFormation is intimately aware of these issues and automatically waits for any necessary propagation to complete before proceeding. In many cases they work with the service teams to ensure that their APIs and success signals are properly tuned for use with CloudFormation.
Uniformity – CloudFormation will choose between in-place updates and resource replacement when you make updates to your stacks.
All of this work takes time, and some of it cannot be completely tested until the relevant services have been launched or updated.
Improved Support for Updates
As I mentioned earlier, many AWS customers use CloudFormation to manage updates to their production stacks. They edit their existing template (or create a new one) and then use CloudFormation’s Update Stack operation to activate the changes.
Many of our customers have asked us for additional insight into the changes that CloudFormation is planning to perform when it updates a stack in accord with the more recent template and/or parameter values. They want to be able to preview the changes, verify that they are in line with their expectations, and proceed with the update.
In order to support this important CloudFormation use case, we are introducing the concept of a change set. You create a change set by submitting changes against the stack you want to update. CloudFormation compares the stack to the new template and/or parameter values and produces a change set that you can review and then choose to apply (execute).
In addition to additional insight into potential changes, this new model also opens the door to additional control over updates. You can use IAM to control access to specific CloudFormation functions such as UpdateStack, CreateChangeSet, DescribeChangeSet, and ExecuteChangeSet. You could allow a large group developers to create and preview change sets, and restrict execution to a smaller and more experienced group. With some additional automation, you could raise alerts or seek additional approvals for changes to key resources such as database servers or networks.
Using Change Sets
Let’s walk through the steps involved in working with change sets. As usual, you can get to the same functions using the AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, and the CloudFormation API.
I started by creating a stack that runs a LAMP stack on a single EC2 instance. Here are the resources that it created:

Then I decided to step up to a more complex architecture. One of my colleagues shared a suitable template with me. Using the “trust but verify” model, I created a change set in order to see what would happen were I to use the template. I clicked on Create Change Set:

Then I uploaded the new template and assigned a name to the change set. If the template made use of parameters, I could have entered values for them at this point.

At this point I had the option to modify the existing tags and to add new ones. I also had the option to set up advanced options for the stack (none of these will apply until I actually execute the change set, of course):

After another click or two to confirm my intent, the console analyzed the template, checks the results against the stack, and displayed the list of changes:

At this point I can click on Execute to effect the changes. I can also leave the change set as-is, or create several others in order to explore some alternate paths forward. When I am ready to go, I can locate the change set and execute it:

CloudFormation springs to action and implements the changes per the change set:

A few minutes later my new stack configuration was in place and fully operational:

And there you have it! As I mentioned earlier, I can create and inspect multiple change sets before choosing the one that I would like to execute. When I do this, the other change sets are no longer meaningful and are discarded.
Managing Rollbacks
If a stack update fails, CloudFormation does its best to put things back the way there were before the update. The rollback operation can fail on occasion; in many cases this is due to a change that was made outside of CloudFormation’s purview. We recently launched a new option that gives you additional control over what happens next. To learn more about this option, read Continue Rolling Back an Update for AWS CloudFormation stacks in the UPDATE_ROLLBACK_FAILED state.
Available Now
This functionality is available now and you can start using it today!

