AWS Partner Network (APN) Blog
-
An APN Partner’s Guide to re:Invent 2015
re:Invent 2015 is fast approaching, and we are so excited that we’ll see so many of you there! In addition to an exciting agenda that includes over 250 breakout sessions being offered throughout the week, we have a number of events and options tailored to APN Partners that I’d like to make sure you’re aware of prior to your flight out to Las Vegas.
Global Partner Summit
The Global Partner Summit on Tuesday, Oct. 6th is a chance for you to hear from AWS leadership about the future of the business and the AWS Partner Network (APN). You’ll have the opportunity to learn from the successes of your peers and participate in sessions focused on developing new skills and insights for your growing cloud practice with AWS. The Global Partner Summit will arm you and your team with content that is tailored to your goals as a Consulting or Technology Partner, focusing on how to drive increasing revenue with cloud-based products and services, gain a greater competitive advantage, and expand your business.
This event is free and exclusive to APN Partners, and will not be available to general attendees.
Summit Agenda:
11:00am — 12:30pm: Global Partner Summit Keynote
12:30pm — 1:30pm: Lunch
1:30pm — 2:30pm: Breakout Sessions and Technical Whiteboard Sessions
2:30pm — 2:45pm: Break
2:45pm — 3:45pm: Breakout Sessions and Technical Whiteboard Sessions
3:45pm — 4:00pm: Break
4:00pm — 5:00pm: Breakout Sessions and Technical Whiteboard Sessions
5:00pm — 7:00pm: Welcome Reception (AWS re:Invent Central)
Already Registered for the Summit?
If you’ve already registered for the Global Partner Summit, you can access the Partner Summit Sessions via the re:Invent Session Builder and Catalog. Log in and use your AWS re:Invent credentials to access session information. We’ll be offering 18 breakout sessions tailored to business and technical topics of importance to our ecosystem. Sessions at the event cover a wide range of topics relevant to executives, sales managers, and technical professionals. We encourage all levels and disciplines to attend.
Haven’t Yet Registered for the Summit?
There’s still time, and a limited number of spots left. To register for the Global Partner Summit, be sure to identify yourself as an APN member and use your company email address when updating your AWS re:Invent registration. Admittance will be validated by your company’s email address domain.
AWS Partner Pavilion
Do you have questions about how to work with the APN? Would you like to discuss technical questions with Partner Solutions Architects (SAs)? We encourage you to visit the AWS Partner Pavilion, located at the front of the re:Invent Central Expo Hall, to learn all about how the APN can help you build your business on AWS, interact with members of the APN Program team, SAs, Partner Managers, and Partner Marketing, and to connect with other APN Partners. We’ll have folks from the AWS Test Drive Program and the AWS SaaS Partner Program at the Pavilion all day to answer your questions about their specific programs, so don’t be shy – come stop by!
The AWS Partner Pavilion will open in the re:Invent Central Expo Hall at 5:00 pm on Tuesday, Oct. 6th, and will be open until 6 pm on Thursday, Oct. 8th.
AWS Training and Certification at re:Invent
Becoming AWS Trained and Certified is a crucial step in your APN journey, since it deepens your AWS knowledge and skills and demonstrates to customers your unique expertise. In Las Vegas, you’ll have lots of opportunities for learning something new and add to your AWS credentials. Here are a few recommendations:
Certification Exams, Perks, and Parties
Getting AWS Certified helps you gain visibility and credibility for your proven experience working with AWS, and is a requirement for Consulting Partners to advance in the APN. Our convenient onsite Exam Testing Center is open throughout the week in Casanova 501, making it even easier to come home AWS Certified. Slots are filling up, so register for an exam today.
AWS Certified Partners can also enjoy our AWS Certification Lounge (Level 1 Foyer), a great place to relax and network with other AWS Certified professionals. And don’t miss the special appreciation event we’re throwing for our AWS Certified community on Thursday, Oct. 8th from 5 -7 p.m. RSVP now so you can join the party and meet with AWS executives and technical evangelists.
Bootcamps, Workshops, and Hands-on Labs
We’re teaching more than a dozen technical bootcamps at re:Invent—including some available only to APN Partners. Not sure what to take? A good starting place is the AWS Business and Technical Professional Accreditation bootcamps, since they’re required for APN Consulting Partners to advance in the APN.
If you’re looking for something some more cutting edge, try Mobile Gaming Using AWS Lambda or Creating Applications for Mobile and the Internet of Things (IoT). Browse the complete list of bootcamps here.
We’ll also have workshops to help you prep for certification and our hands-on lab room is always a hugely popular draw since it provides free practice with specific AWS services and solutions in a live AWS environment. It’s also free. (Hint: beat the crowds by getting there early or hitting our late night lab session on Tuesday, Oct. 6 from 7-10 p.m.) See the full re:Invent training schedule.
The AWS Partner Solutions Explorer
I want to also highlight a great tool and resource that is going to be available at re:Invent – the AWS Partner Solutions Explorer.
The AWS Partner Solutions Explorer is an interactive, self-paced tour of AWS Partner solutions. Based on the user’s needs, the Explorer will search through our database of thousands of AWS Partners and point them to AWS Partners at re:Invent that may be the best fit for them (based upon the AWS Partner’s APN Competencies, AWS Test Drives, AWS Marketplace AMIs, and AWS Quick Start Reference Deployments). If any additional questions arise, users can simply ask one of our AWS experts onsite for assistance. The AWS Partner Solutions Explorer will be located in the Artist Foyer on Level 2, the AWS Partner Pavilion, in the Executive Summit, and in the AWS Booth. Check it out!
Can’t Join Us at re:Invent?
Never fear. We’re going to be live tweeting from the @AWS_Partners Twitter handle throughout the week. In addition, the @AWSreInvent will be live tweeting from the re:Invent keynotes, along with many of the sessions hosted throughout the week, after-hours events, and the expo hall.
We also have a live stream that you can sign up for now by clicking here. The live stream will include coverage of the following events:
Wednesday, Oct. 7th:
8:30am – 10:30am: Keynote Address, Andy Jassy, Sr. VP, Amazon Web Services
11:00am – 12:00pm: Raising the Bar on Video Streaming Quality by Utilizing AWS: Amazon Instant Video Case Study
12:15pm – 1:15pm: DevOps at Amazon: A Look at Our Tools and Processes
1:30pm – 2:30pm: AWS Innovation at Scale
2:45pm – 3:45pm: Amazon Aurora Deep Dive
4:15pm – 5:15pm: A Day in the Life of a Netflix Engineer Using 37% of the Internet
Thursday, October 8th:
9:00am – 10:30am: Keynote Address, Dr. Werner Vogels, CTO, Amazon
11:00am – 12:00pm: Security Operations at Massive Scale
1:30pm – 2:30pm: Inspiring Innovation in the Cloud @ NASA/JPL and Beyond
2:45pm – 3:45pm: Amazon EC2 Container Service: Distributed Applications at Scale
4:15pm – 5:15pm: AWS Lambda and the Serverless Cloud
5:30pm – 6:30pm: Scaling Up to Your First 10 Million Users
-
Now Available: Updated AWS Business and Technical Accreditation Courses
One of the keys to building a successful business on AWS is ensuring you’re properly trained on all the platform has to offer. AWS continues to innovate at a rapid pace, and it’s crucial we provide you with up-to-date training resources as you build your business on AWS. To that end, today we released substantially updated English-language versions of our web-based accreditation training courses for APN Partners.
The AWS Business and Technical Professional accreditation courses are designed to help you stay current on AWS, articulate the benefits of AWS services to customers, and help make informed decisions about IT solutions. Both courses are available to you at no cost via the APN Portal and count toward APN tier requirements that help you advance. Here’s a quick rundown of each course:
AWS Business Professional (Released September 2015)
What you’ll learn: Foundational knowledge on key AWS services and core business value propositions including AWS Marketplace solutions.
Who it’s for: Business roles responsible for articulating the benefits of AWS services and how AWS and partner solutions help solve common business problems.
AWS Technical Professional (Released September 2015)
What you’ll learn: Key foundational technical concepts around AWS, including global infrastructure, services, common solutions, migration, security, and compliance.
Who it’s for: Technical roles responsible for helping customers make informed decisions about IT solutions.
What’s changed? The 2015 English versions add coverage of key new AWS services and features; are more concise and better align AWS solutions with APN competencies; and feature improved interactivity with new video demos and exercises.
If you’re still working on the 2014 English version of either course, you have two options. You can enroll in the improved 2015 version to achieve your accreditation. Or, you can finish the version you’ve already started, as long as you complete all your assessments by December 31. After this date, the 2014 English versions will no longer be available.
Looking for more training? Besides the updated accreditation courses, make sure to check out our new APN Partner Learning Plan, a week-by-week guide and checklist to the best accreditations, courses, and certifications to take in your first 90 days with AWS.
To learn more, visit Partner Training & Certification.
-
Machine Learning on the AWS Cloud
The following is a guest post from one of our APN SAs. This post is an introductory, high-level post. It is intended to help APN Partners familiarize themselves with the concept of machine learning, and to learn more about the use cases that can be supported using Amazon Machine Learning.
Introduction
There can be tremendous amounts of information buried within gigabytes of your data, including web site visitor metrics, sales information, and email campaign responses, to name a few. How do you tap into that information to make informed business decisions? Is there a way an organization can take advantage of its existing repositories of data to predict the choices customers may make in the future?
Machine learning (ML) can help you use historical data to make better business decisions. ML algorithms discover patterns in data, and construct mathematical models using these discoveries. With machine learning, you can use these models to make predictions on future data. For example, one possible application of a machine learning model would be to predict how likely a customer is to purchase a particular product based on their past behavior.
Smart Applications
Machine learning is the technology that can find patterns in data and use them to make predictions for new data points as they become available. A simplistic definition of a smart application:
Your data + machine learning = smart applications
Smart applications can predict future user action based on past actions. For example, based upon what it knows about the user, a smart application can predict whether the user will make a purchase. A lot of banks are using a smart application concept to warn a user if their log in pattern changes. It’s not uncommon in retail banking websites to see a warning whenever a user tries to log in from a different location or computer. Another example can be seen in specific recommendations made to users from a website; a number of e-commerce and news aggregation websites offer recommendations on a product or news that might be interesting for the user.
The science of machine learning provides the mathematical underpinnings needed to run the analysis and to make sense of the results. It can help you turn your data into high-quality predictions by finding and codifying patterns and relationships within the data.
What is Amazon Machine Learning?
Amazon Machine Learning is a service that that makes it easy for developers of all skill levels to use machine learning technology, based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community. Amazon Machine Learning allows you to easily build predictive applications, including fraud detection, demand forecasting, and click prediction. Amazon Machine Learning uses powerful algorithms that can help you create machine-learning models by finding patterns in existing data, and using these patterns to make predictions from new data as it becomes available.
You can use Amazon Machine Learning through the AWS Management Console and access the data and model visualization tools, as well as wizards, to guide you through the process of creating machine learning models, measuring their quality, and fine-tuning the predictions to match your application requirements. Once the models are created, you can get predictions for your application by using the simple Amazon Machine Learning API, without having to implement custom prediction generation code or manage any infrastructure.
Amazon Machine Learning is highly scalable, and can generate billions of predictions, and serve those predictions in real-time and at high throughput. With Amazon Machine Learning there is no setup cost and you pay as you go, so you can start small and scale as your application grows.
Popular Amazon ML Use Cases
There are a number of use cases for which Machine Learning is a good fit. For APN Partners, I recommend that you consider how smart applications may enhance the value you’re able to provide for your customers on AWS in the following areas, which are outlined on our main Amazon Machine Learning page in more detail: Fraud Detection, Content Personalization, Propensity Modeling for Marketing Campaigns, Document Classification, Customer Churn Prediction, and Automated Support Recommendation for Customer Support.
To find out more about Amazon Machine Learning, visit the service web pages and get started building your first predictive model, today.
-
New AWS Support for Commercially-Supported Docker Applications: Docker Trusted Registry and Docker Engine
The AWS cloud has been shown to be a natural complement to the flexibility that Docker containers offer organizations, and today Amazon EC2 and and Amazon ECS are very popular places to launch and run Docker containers. Customers continue to expand their container footprint and move their applications from dev to test to production, and look for enhanced support and additional product offerings as they embrace the AWS cloud as a place to run Docker containers. At DockerCon 2015 in San Francisco, we discussed work done by both teams to better support Docker on AWS for our customers, and today we take another step toward supporting those who wish to run Docker exclusively on AWS by announcing support for Docker Trusted Registry in AWS Marketplace. Customers can go from building a Docker application locally on a developer’s laptop and ship to their production Amazon Virtual Private Cloud (Amazon VPC) with just a few commands.
Like Docker Hub, Docker Trusted Registry (DTR) is a solution that allows organizations to store and manage Docker containers. However, DTR can be run as an EC2 instance, allowing complete control over how and where the registry is available and accessed from within your environment.
Configuring Your AWS Environment for Docker Trusted Registry
By running Docker Trusted Registry, organizations are able to create custom levels of access control to their Docker images. Certain components of this access control model include support for customer SSL certificates, LDAP integration to limit access to specific users, and leveraging the network access control capabilities of Amazon VPC.
Amazon VPC allows you to configure network settings and isolate cloud resources as much as necessary to meet security or compliance standards. In the case of DTR, we recommend first deciding if your registry instance should be accessible from the Internet or only from within your VPC. If the instance should be available from the Internet, you can launch the DTR instance into a public subnet. However, take care to configure the security group to only allow access from specific trusted IP ranges over ports 22 (SSH), 80 (HTTP) and SSL (443).
Please note that when DTR is launched from AWS Marketplace, the default security group is open to the world, so it’s up to you to restrict access to the IP ranges appropriate for you environment.
The other option is to place your DTR instance into a private subnet so that only resources within your network are able to access the registry. In this case, you’ll need to ensure that you have either a bastion host set up or a VPN into your VPC so you can manage the DTR instance via the web GUI.
We recommend using an Amazon Route 53 private hosted zone with Docker Trusted Registry. A private hosted zone is always queried first by instances in your VPC, and is only accessible from within your VPC- so this allows for the convenience of choosing the endpoint you will use to interact with your registry. This DNS name is what you’ll reference when pushing and pulling images from your registry, so choose something that makes sense- here we’ll use dtr.mydomain.com as an A record that points to our DTR instances IP address.
Because DTR leverages Amazon S3 for back-end storage of your Docker images, we recommend creating an IAM role that will allow your instance to communicate securely with S3. IAM roles are assigned to EC2 instances at instance launch. Here we assume that there is a dedicated bucket for our Docker images, and we can scope our IAM policy accordingly:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "s3:*", "Resource": [ "arn:aws:s3:::my_DTR_bucket", "arn:aws:s3:::my_DTR_bucket/*" ] } ] }Once you’ve decided on a VPC, how to scope access to your registry, created an IAM role, and decided on a DNS record to use, you’re ready to move forward with setting up the DTR instance itself.
Setting Up Supported Docker Environments on AWS
To begin, we’ll be using the Docker Trusted Registry “pay as you go” AMI from AWS Marketplace. This licensing model is intended to simplify the deployment experience. To further enhance the experience, Docker have included a 30-day free trial of their software. The details are provided on the product page in AWS Marketplace, which is listed here: https://aws.amazon.com/marketplace/pp/B014VG1SIG
Once you’ve launched the AMI, you can follow the AWS and Docker Trusted Registry guide to configure the DTR instance: https://docs.docker.com/docker-trusted-registry/installAWS/
When launching your instance, you’ll need to choose an appropriate instance size. Docker recommends an M3.Large for initial test deployments. As your environment grows, you can use the monitoring features built into the Docker Trusted Registry web GUI to keep an eye on resource utilization and scale your instance size as needed.
Once the DTR instance is up and running, you’ll also need to launch Docker Engine instances (instances running the commercially-supported version of Docker). You can find AMIs to launch Docker Engine instances here: https://aws.amazon.com/marketplace/pp/B014VG1R4Q
One thing to note about Docker Engine instance configuration: if you’re using a self-signed certificate, you’ll also have to configure your clients to pull the certificate from the DTR instance. This can be done using the following commands passed as a user data script:
#!/bin/bash export DOMAIN_NAME=dtr.mydomain.com openssl s_client -connect $DOMAIN_NAME:443 -showcerts /dev/null | openssl x509 -outform PEM | sudo tee /etc/pki/ca-trust/source/anchors/$DOMAIN_NAME.crt sudo update-ca-trust sudo service docker restartThis process depends on your OS, so check here for more comprehensive detail: https://docs.docker.com/docker-trusted-registry/configuration/#installing-registry-certificates-on-client-docker-daemons
Once your Docker Engine client(s) are launched, you can begin interacting with the DTR instance, pushing and pulling images to and from your own private registry from another EC2 instance within your AWS VPC, a peer VPC, or remote location connected via VPN.
From Developer Desktops Direct to the Cloud
Continuous Integration and Delivery is a critical workflow for many teams. Docker supports a number of CI/CD tools, like AWS CodePipeline and AWS CodeDeploy, and a number of deployment endpoints, like Amazon EC2 or Amazon ECS. Docker Trusted Registry can serve as the foundation of these automated workflows that can take code from a developer’s desktop, through integration and unit testing, to a staging or QA environment, and finally to production deployment.
In order to understand how to interact with DTR at the most foundational level, we’ll examine a basic Docker image workflow that can provide the baseline understanding necessary to build more complex CI/CD workflows later.
We’ll first need a client machine that is configured to interact with the DTR instance.
First, we’re going to pull a public Jenkins instance with docker pull Jenkins
Next we’ll tag the image with docker tag dtr.mydomain.com/my-jenkins
Finally, push the image to your local DTR instance docker push dtr.mydomain.com/my-jenkins
A robust and scalable CI pipeline can be built with Docker and Jenkins on AWS to take the code from your developer laptops, directly into integration testing cluster on AWS. Code pushed to a repository like Github can trigger automatic builds of containers using Jenkins, and the resulting container can be pushed to your Docker Trusted Registry instance. This container can then be tested in QA, and ultimately rolled out to production using AWS services like AWS CodeDeploy or AWS Elastic Beanstalk.
We encourage you to take a look at the new Docker commercially-supported software in AWS Marketplace today. We hope the above information gets you started, and we’d love to hear your feedback.
For additional video tutorials, resources and more at the Docker and AWS Resource Center: https://docker.com/aws
-
Announcing: Windows Server on AWS Bootcamps in Seattle, WA, and Bangalore, India
We’re pleased to announce an exclusive training event for System Integrator (SI) APN Partners. This three-day bootcamp training covers Microsoft Windows Server and the .NET Framework on AWS. This is intended for IT professional, architect, and developer consultants with field-service roles helping Enterprise customers migrate Windows Server workloads to the cloud. The bootcamp will be held in October 20 – 22 in Seattle, and October 27 – 29 in Bangalore.
The training will be a mix of lectures and hands-on labs covering the following topics: key AWS services such as Amazon Elastic Compute Cloud (Amazon EC2), Amazon VPC, AWS Identity and Access Management (IAM), Amazon Elastic Block Store (Amazon EBS), AWS CloudFormation, and AWS Directory Service, as well as building Microsoft workloads with Active Directory, SQL Server, SharePoint Server, and Exchange Server. Additionally, the bootcamp covers developing with PowerShell and .NET.
AWS is offering this program with free registration for APN Partners, but APN Partners will be responsible for travel, lodging, and all other expenses. Breakfast and a working lunch will be provided in the training room.
We anticipate this training to be in high demand and seating is limited, so APN Partners must submit a Request for Application. The Request for Application lists pre-requisites that invitees would be expected to complete before attending the bootcamp. Once applications are received, AWS will notify invited guests and provide more information. Each APN Partner may nominate multiple members of their technical staff to attend.
To get the Request for Application form, please email scottzim@amazon.com with only the word “BOOTCAMP” as the Subject line of the email.
To learn more about AWS for Windows, click here.
-
Providing a Leading PHP Platform on AWS Marketplace – Zend Technologies, an Advanced APN Technology Partner
PHP is a scripting language designed for web and mobile development, allowing static webpages to become dynamic, interactive, and personalized. AWS offers a number of resources to help developers develop secure, reliable, and scalable PHP applications on the AWS cloud. In addition to the resources, tools, and documentation we provide, Zend Technologies, an Advanced APN Technology Partner, provides an enterprise-grade application platform for PHP through AWS Marketplace.
About Zend
Zend’s roots in the PHP community are very deep—the company’s founders were two of the original authors of PHP—and Zend leverages its PHP expertise to deliver a robust, scalable, and high performing environment that ensures businesses of all sizes can innovate fast and reliably. Zend products have already been used to develop and run thousands of PHP business-critical applications worldwide. An APN Partner since 2012, Zend offers its Zend Server solution on AWS through AWS Marketplace.
Drivers to the AWS Cloud
General trends toward the cloud in application development, along with customer demand, drove the Zend team to build on AWS. “Nearly all new application development is cloud-first; and Zend was an early adopter of cloud-first strategies,” explains Amy Anderson, Zend Director of Business Development. Zend’s customer base, which is focused on agile development and continuous delivery, has been naturally inclined toward the cloud, and the team recognized working with AWS as the most effective way to monetize its cloud offerings.
“AWS makes it so easy for developers to access and use Zend Server,” Anderson says. “From the citizen developer to the corporate development department, AWS lets developers consume exactly what they need. We see developers start with very small development images, which is cost effective. And we also see large organizations running huge clusters that would be impractical in an on premise environment.”
Zend Server on AWS

Zend Server is a complete PHP distribution that also includes value-add features designed to optimize productivity, performance, scalability and reliability. Z-Ray, for example, gives developers real-time insights into how their code is performing and how it’s using systems resources—including AWS resources. Zend Server includes over 80 PHP extensions and supports Apache.
Zend Server on AWS lets organizations scale their applications to meet aggressive growth goals. Applications can get a performance boost with an integrated set of technologies that enables caching of data and code, job queues for asynchronous or offline processing of tasks, job scheduling to improve response and reduce server load, and dynamic cluster configuration.
Zend Server leverages AWS CloudFormation to enable customers to easily scale Zend Server on AWS. Zend’s clustering technology is integrated with AWS clustering, ensuring that every workload is optimized for the available resources.
How Can APN Partners Work with Zend Server?
Anderson highlighted some of the ways in which fellow APN Partners can use Zend Server to support their end customers, explaining, “APN Partners who are delivering business-critical web and mobile applications, such as e-commerce, can benefit from the performance, scaling, and compliance features of Zend Server. For example, any vendor who wants the ability to deliver applications faster should consider Z-Ray, a technology in Zend Server that gives developers real-time insights into the quality and performance of their code.”
Zend Server and AWS can also help APN Partners react faster to security vulnerabilities. In one case, an MSP that supports hundreds of SMBs had several commerce sites hacked. It took the MSP two weeks to patch the sites running open source PHP in a hosted environment, while it only took two days to patch the sites running Zend Server on AWS. “Because Zend Server and AWS both increase the levels of automation, businesses can react faster to a changing environment,” Anderson says. “This is especially critical with commerce security.”
The Benefits of Working with AWS and AWS Marketplace
Zend’s AWS business grew by 3X in 2014, and the company is targeting 5X growth in 2015. A key driver of the company’s growth is AWS Marketplace. “We’re heavily dependent on the AWS Marketplace to market and deliver our cloud offerings. AWS Marketplace helps drive revenue for us. We’ve benefited from being featured on the Marketplace homepage banner and in the featured products section,” says Anderson.
From AWS Marketplace, customers can choose from three different editions of Zend Server: Developer, Professional, or Enterprise. Each edition is available on both Ubuntu and Red Hat Linux. Once the customer selects their edition and Linux distribution, they can choose from a variety of Amazon Elastic Compute Cloud (Amazon EC2) instance types on AWS, from the t2.micro to the c2.2xl. “AWS Marketplace gives customers the ability to select exactly what they want, and it presents the options in a logical way,” Anderson says.
A big benefit of offering its solutions on AWS Marketplace is the ability for Zend to support a broad range of customer use cases and customer sizes. “Because of the very small instance types available on AWS, we can offer our product to a broad number of very small customers,” explained Anderson “At the same time, the scalability of AWS clustering allows our largest customers to run configurations that were practically not feasible in an on-premise environment.”
Another common use case for Zend Server on AWS is customers who need to run a supported version of PHP 5.3. Currently the open source PHP community supports versions 5.4, 5.5., and 5.6. But for a variety of reasons, many customers are still running applications that use PHP 5.3. Zend is the only company to provide long-term support for PHP 5.3 at this time. “If you have a production application that uses PHP 5.3, you’re taking a big risk if you’re not under support,” says Anderson. “One of the easiest ways to run a supported, scalable environment for PHP 5.3 is to set up Zend Server through AWS Marketplace.”
Looking Forward
Zend has been investing in technologies that fuel the API economy and mobile applications for the enterprise, such as its Apigility offering. “We’re looking forward to using AWS to help more enterprise customers achieve their business objectives in the areas of mobile app development,” Anderson says.
“We’ve been extremely pleased with our relationship with AWS,” she continues. “And this is just the tip of the iceberg.”
-
The Road to Modern Ops, a New Curriculum from APN Partner HashiCorp
AWS is excited to announce the availability of HashiCorp’s Road to Modern Ops, an interactive curriculum dedicated to guiding organizations from manual processes to modern, automated operations. Through these labs, you’ll have the opportunity to provision AWS infrastructure using HashiCorp products Terraform Packer, and Atlas.
AWS and HashiCorp
AWS provides a flexible and elastic cloud computing platform that facilitates API-driven infrastructure as code, allowing development and operational teams to work closer together. HashiCorp, a member of the AWS Partner Network (APN), has built a variety of tooling around the AWS APIs (amongst others) that allow customers to provision cloud infrastructure in a repeatable fashion.
What is the Road to Modern Ops Curriculum All About?
This educational series covers the full spectrum of HashiCorp automation tools, but two labs in particular are focused on highlighting AWS functionality. The first of these labs, “Automate provisioning with Terraform”, teaches students how to use Terraform by HashiCorp to build AWS resources like Amazon Virtual Private Clouds (VPCs), Amazon EC2 instances, and Elastic Load Balancers. The declarative syntax used by Terraform configuration files represents infrastructure as code, which enables repeatable deployments of your production environment, better visibility into the relationships between different components and systems, and rapid recovery from failures. This lab is available now here.
The second lab uses Packer by HashiCorp to automate the provisioning and configuration of Amazon EC2 AMIs. The AMIs built by Packer can be used to stand up fully configured EC2 instances into the supporting infrastructure provisioned during the Terraform lesson. By using Packer to move the configuration of systems to before the deploy stage, rather than after, organizations can take a major step toward enabling immutable infrastructure and ultimately ensuring a consistent configuration of resources in their production environments. This lab is available now here.
The labs use HashiCorp’s Atlas to run Terraform and Packer. By running Terraform and Packer within Atlas, development and operations teams automate, audit, and collaborate on infrastructure changes.
Why You Should Consider Signing up for the HashiCorp Curriculum
As you work through this curriculum, you’ll be introduced to the concepts of immutable infrastructure and automated provisioning, two practices that leverage the flexibility and elasticity of the AWS cloud, and you’ll benefit from hands-on experience using what our customers tell us are very effective tools for interacting with AWS Services.
Want to learn more about HashiCorp? Visit the company’s website here.
-
Introducing the Amazon RDS Migration Tool
Migrating databases can be a challenging task, often requiring application downtime while data is moved from the source database to the target database. To help you accomplish migrations effectively and with minimal downtime, we’re excited to tell you about the Amazon RDS Migration Tool. This powerful utility can be used to help you and/or your customers move data from on-premise and Amazon EC2-based databases to Amazon RDS, Amazon Redshift, and Amazon Aurora databases.
The RDS Migration Tool supports not only like-to-like migrations, e.g. Oracle-to-Oracle, but also migrations between different database platforms, e.g. SQL Server-to-Amazon Aurora. It runs as an EC2 instance and leverages the scaling power of AWS to match the needs of your migration task.
What Value Does the RDS Migration Tool Provide My Firm and End Customers?
If you’re an APN Partner helping customers migrate their workloads to AWS, the RDS Migration Tool can help you minimize the application downtime by capturing database changes on the source database while the source still receives transactions from the application. As the RDS Migration Tool can capture and replicate data heterogeneously, it can minimize application downtime even in complicated use cases, such as when migrating an Oracle database to Amazon Aurora
Specifically, the RDS Migration Tool provides the following features and benefits:
- Support for transactional change data capture (CDC) and application, with low performance impact on the source
- Support for heterogeneous migration (e.g. SQL Server-to-Aurora)
- Support for homogeneous migration (e.g. Oracle-to-Oracle)
- Data transfer optimizations for migrating entire database tables
- Light-weight column mapping & transformations
- Ability to select individual tables and columns and filter data rows for migration
- Reliable delivery and recovery
- Intuitive user experience that simplifies the steps required to migrate to AWS
- Monitoring and control functions with dashboard, metrics and alerts
- No need to deploy agents on the source to capture changes
- Scheduling of migration tasks
The use of the RDS Migration Tool software is free, however, the tool requires the use of Amazon EC2, Amazon EBS and other AWS services, and customers will be charged normal AWS fees for the migration instances they create.
How Do I Access the Tool?
Reach out to your PDM if you are interested in signing up to use the tool.
-
Performance Testing in Continuous Delivery Using AWS CodePipeline and BlazeMeter
This is a guest post from our friends at BlazeMeter, an APN Technology Partner.
By now, most software delivery teams have heard about and are either practicing or planning to practice some flavor of continuous delivery. Its popularity has exploded in recent years largely because it has proven to have immense benefits for the rapid release of high-quality software. After each commit, the software is built and tested, and a deployable artifact is the result. How or when that artifact is deployed to either a staging area or to production depends on the team, their process, and their infrastructure.
While unit and functional tests have become standard practices of good software delivery, load and performance tests have been a bit neglected in many workflows, reserved for specially scheduled events and generally conducted manually by a group. In part, this is because load and performance tests have tended to involve complex and brittle scripts that require dedicated, vendor-specific environments and are difficult to automate or run quickly enough for fast feedback.
Since AWS CodePipeline is such a powerful automation framework for managing the continuous delivery process from start to finish, let’s take a look at how we can more easily inject automated load tests at the right places in the delivery workflow with BlazeMeter’s native AWS CodePipeline integration.
Who is BlazeMeter?
BlazeMeter, based in Mountain View, CA, provides an easy-to-use, cloud-based performance testing platform that can be accessed directly from any stage of AWS CodePipeline (as a Test action) at any point where load, stress, or performance tests need to run. BlazeMeter extends Apache JMeter technology by providing some important pieces, like automatic scaling and professional results reporting. If your team hasn’t already adopted JMeter, it’s a very powerful and flexible open source tool capable of orchestrating any type of performance test, from the simplest to the most sophisticated. If you are already using JMeter, you can begin working with BlazeMeter right away.
What Kinds of Performance Tests Should We Run?
When it comes to performance tests in the delivery pipeline, different architectures and objectives call for different strategies.
For example, if you’re deploying an API server that handles a lot of incoming requests from mobile devices or other applications, tests might focus mostly on throughput: the hits or requests per second that various endpoints can handle within given response-time expectations. Those tests can use straight URL requests without regard for the complexities of think time or extraneous logic that synthetically shape traffic.
To perform this type of test in AWS CodePipeline, Edit the Pipeline, add an Action to the target stage where the API test should run, choose Test as the Action category, and choose BlazeMeter as the Test provider.

After choosing Connect, you’ll be taken to BlazeMeter’s sign-in page. If you’re not already a BlazeMeter user, you can create a free account right there and have instant access.
Next you can choose New API Test from the different types of tests BlazeMeter offers.

BlazeMeter provides an easy-to-use utility where you can simply enter your endpoint URLs and required payload data. You can add the URL, specify the HTTP verb (GET, POST, PUT, DELETE), and even add custom headers. In this example, I’m providing the necessary Content-Type header as well as a JSON payload for my POST request that will test selecting cities in a flight reservation app.

Two things to take special note of in the test configuration:
– Amazon CloudWatch integration. Here you can have BlazeMeter include Amazon CloudWatch metrics for your Amazon EC2 instances involved in the test.

– Thresholds. Use this feature to define what will constitute test failure, such as average response time or percentage of errors being above selected values.

API-oriented test scenarios like these could run immediately after an AWS CodePipeline action that uses AWS CodeDeploy or AWS Elastic Beanstalk to configure a staging environment, and they can run quite speedily.
Simulating Realistic Traffic in Automated Load Tests
A more thorough and real-world performance test will take a little more time to set up and will require Jmeter scripts. Rather than just hitting the app with a barrage of HTTP requests, we want to be more strategic in how we shape the overall load profile. (Getting started with creating Jmeter scripts is a bit beyond the scope of this blog post, but we’ll provide some useful tips below. BlazeMeter provides lots of great Jmeter tutorials at https://docs.blazemeter.com/.)
For these more realistic tests, we once again add an Action to the desired stage in AWS CodePipeline, and choose BlazeMeter as our Test provider, but this time we’ll select New Jmeter Test.

Business considerations enter the picture at this point. How many users do we expect? What will they be doing with the app, and how frequently? It’s often useful to include business analysts and product marketing teams in these discussions as they can bring useful metrics about user activity.
For these scenarios, we should create Jmeter scripts that represent different types of expected interactions. For example, if we’re testing a flight reservation website, we should have some users browsing and looking at flight prices, while others are making reservations, and still others are canceling flights or choosing hotels. And since humans stop to read pages or fill in web forms, we should make use of scripted timers, such as Jmeter’s Uniform Random Timer, to introduce those natural delays into the test.

Ultimately we want to understand what we sometimes call “business throughput”: How many successful actions customers can perform, how many search results are returned, or how many total flights are reserved. Choke points and constraints around these items have a direct impact on the business so they tend to be the important elements to focus on during the test. Also, since we know the underlying components of the stack involved in these transactions, this data gives us ideas about where to start our investigations.
Using Jmeter’s Transaction Controllers and naming them clearly will help you identify these business transactions after the test run.

In the example below, I’ve labeled different actions in a flight reservation system and the BlazeMeter report tells me about response times and number of transaction calls.

Let AWS CodePipeline Do The Work
Now that we can automate any kind of performance and load test using AWS CodePipeline and BlazeMeter, we hope to help teams focus on the more critical tasks of fixing defects and optimizing and tuning the bottlenecks that these automated tests discover. Since tests run so frequently, baselines start to develop and we can observe trends that provide a sense of familiarity with how our apps behave. Tuning gets easier, and aberrations become more evident.
Before you know it, you’ll be confidently releasing to production without thinking twice, knowing that your users are seeing high-class performance.
-
Getting the Most out of the Amazon S3 CLI
Editor’s note: this is a co-authored guest post from Scott Ward and Michael Ruiz, Solutions Architects with the APN.
Amazon Simple Storage Service (Amazon S3) makes it possible to store unlimited numbers of objects, each up to 5 TB in size. Managing resources at this scale requires quality tooling. When it comes time to upload many objects, a few large objects or a mix of both, you’ll want to find the right tool for the job. Today we will take a look at one option that is sometimes overlooked: the AWS Command Line Interface (AWS CLI) for Amazon S3.
Note: Some of the examples in this post take advantage of more advanced features of the Linux/UNIX command line environment and the bash shell. We included all of these steps for completeness, but wont spend much time detailing the mechanics of the examples in order to keep the post at reasonable length.
What is Amazon S3?
Amazon S3 is global online object store and has been a core AWS service offering since 2006. Amazon S3 was designed for scale: it currently stores trillions of objects with peak load measured in millions of requests per second. The service is designed to be cost-effective—you pay only for what you use—durable, and highly available. See the Amazon S3 product page for more information about these and other features.
Data uploaded to Amazon S3 is stored as objects in containers called buckets and identified by keys. Buckets are associated with an AWS region and each bucket is identified with a globally unique name. See the S3 Getting Started guide for a typical Amazon S3 workflow.
Amazon S3 supports workloads as diverse as static website hosting, online backup, online content repositories, and big data processing, but integrating Amazon S3 into an existing on-premises or cloud environment can be challenging. While there is a rich landscape of tooling available from AWS partners and open-source communities, a great place to start your search is the AWS CLI for Amazon S3.
The AWS Command Line Interface (AWS CLI)
The AWS CLI is an open source, fully supported, unified tool that provides a consistent interface for interacting with all parts of AWS, including Amazon S3, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Virtual Private Cloud (Amazon VPC), and other services. General information about the AWS CLI can be found in the AWS CLI User Guide.
In this post we focus on the
aws s3command set in the AWS CLI. This command set is similar to standard network copy tools you might already be familiar with, likescporrsync, and is used to copy, list, and delete Amazon S3 buckets and objects. This tool supports the key features required for scaled operations with Amazon S3, including multipart parallelized uploads, automatic pagination for queries that return large lists of objects, and tight integration with AWS Identity and Access Management (IAM) and Amazon S3 metadata.The AWS CLI also provides the
aws s3apicommand set, which exposes more of the unique features of Amazon S3 and provides access to bucket metadata, like lifecycle policies designed to migrate or delete data automatically.There are two pieces of functionality built into the AWS CLI for Amazon S3 tool that help make large transfers (many files and large files) into Amazon S3 go as quickly as possible:
First, if the files are over a certain size, the AWS CLI automatically breaks the files into smaller parts and uploads them in parallel. This is done to improve performance and to minimize impact due to network errors. Once all the parts are uploaded, Amazon S3 assembles them into a single object. See the Multipart Upload Overview for much more data on this process, including information on managing incomplete or unfinished multipart uploads.
Second, the AWS CLI automatically uses up to 10 threads to upload files or parts to Amazon S3, which can dramatically speed up the upload.
These two pieces of functionality can support the majority of your data transfer requirements, eliminating the need to explore other tools or solutions.
For more information on installation, configuration and, usage of the AWS CLI and the
s3commands, see the following AWS documentation:AWS S3 Data Transfer Scenarios
Let’s take a look at using the AWS CLI for Amazon S3 in the following scenarios and dive into some details of the Amazon S3 mechanisms in play, including parallel copies and multipart uploads.
- Example 1: Uploading a large number of very small files to Amazon S3
- Example 2: Uploading a small number of very large files to Amazon S3
- Example 3: Periodically synchronizing a directory that contains a large number of small and large files that change over time
- Example 4: Improving data transfer performance with the AWS CLI
Environment Setup
The source server for these examples is an Amazon EC2 m3.xlarge instance located in the US West (Oregon) region. This server is well equipped with 4 vCPUs and 15 GB RAM, and we can expect a sustained throughput of about 1 Gb/sec over the network interface to Amazon S3. This instance will be running the latest Amazon Linux AMI (Amazon Linux AMI 2015.03 (HVM).
The example data will reside in an Amazon EBS 100 GB General Purpose (SSD) volume, which is an SSD-based, network-attached block storage device attached to the instance as the root volume.
The target bucket is located in the US East (N. Virginia) region. This is the region you will specify for buckets created using default settings or when specifying us-standard as the bucket location. Buckets have no maximum size and no object-count limit.
All commands that are represented in this document are run from the bash command line. All command-line instructions will be represented by a $ as the starting point for the command.
We will be using the
aws s3command set throughout the examples. Here is an explanation for several common commands and options used in these examples:- The
cpcommand initiates a copy operation to or from Amazon S3. - The
--recursiveoption instructs the AWS CLI for Amazon S3 to descend into subdirectories on the source. - The
--quietoption instructs the AWS CLI for Amazon S3 to print only errors rather than a line for each file copied. - The
--syncoption instructs the AWS CLI for Amazon S3 to initiate a copy to or from Amazon S3. - The Linux
timecommand is used with each AWS CLI call in order to get statistics on how long the command took. - The Linux
xargscommand is used to invoke other commands based on standard output or output piped to it from other commands.
Example 1 – Uploading a large number of small files
In this example we are going to simulate a fairly difficult use case: moving thousands of little files distributed across many directories to Amazon S3 for backup or redistibution. The AWS CLI can perform this task with a single command,
s3 cp --recursive, but we will show the entire example protocol for clarity. This example will utilize the multithread upload functionality of theaws s3commands.- Create the 26 directories named for each letter of the alphabet, then create 2048 files containing 32K of pseudo-random content in each
$ for i in {a..z}; do mkdir $i seq -w 1 2048 | xargs -n1 -P 256 -I % dd if=/dev/urandom of=$i/% bs=32k count=1 done- Confirm the number of files we created for later verification:
$ find . -type f | wc -l 53248- Copy the files to Amazon S3 by using
aws s3 cp, and time the result with thetimecommand:
$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_smallfiles/ real 19m59.551s user 7m6.772s sys 1m31.336sThe
timecommand returns the ‘real’ or ‘wall clock’ time theaws s3 cptook to complete. Based on the real output value from thetimecommand, the example took 20 minutes to complete the copy of all directories and the files in those directories.Notes:
- Our source is the current working directory (.) and the destination is s3://test_bucket/test_smallfiles.
- The destination bucket is s3://test_bucket.
- The destination prefix is test_smallfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.
TIP:
In many real-world scenarios, the naming convention you use for your Amazon S3 objects will have performance implications. See this blog post and this document for details about object key naming strategies that will ensure high performance as you scale to hundreds or thousands of requests per second.
- We used the Linux
lsofcommand to capture the number of open connections on port 443 while the above copy (cp) command was running:
$ lsof -i tcp:443 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME aws 22223 ec2-user 5u IPv4 119954 0t0 TCP ip-10-0-0-37.us-west-2.com pute.internal:48036->s3-1-w.amazonaws.com:https (ESTABLISHED) aws 22223 ec2-user 7u IPv4 119955 0t0 TCP ip-10-0-0-37.us-west-2.com pute.internal:48038->s3-1-w.amazonaws.com:https (ESTABLISHED) <SNIP> aws 22223 ec2-user 23u IPv4 118926 0t0 TCP ip-10-0-0-37.us-west-2.com pute.internal:46508->s3-1-w.amazonaws.com:https (ESTABLISHED) ...10 open connectionsYou may be surprised to see there are 10 open connections to Amazon S3 even though we are only running a single instance of the copy command (we truncated the output for clarity, but there were ten connections established to the Amazon S3 endpoint ‘s3-1-w.amazonaws.com’). This demonstrates the native parallelism built into the AWS CLI.
Here is an example of a similar command that gives us the count of open threads directly:
$ lsof -i tcp:443 | tail -n +2 | wc -l 10- Let’s also peek at the CPU load during the copy operation:
$ mpstat -P ALL 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37) 05/04/2015 _x86_64_ (4 CPU) <SNIP> 09:43:18 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 09:43:19 PM all 6.33 0.00 1.27 0.00 0.00 0.00 0.51 0.00 91.90 09:43:19 PM 0 14.14 0.00 3.03 0.00 0.00 0.00 0.00 0.00 82.83 09:43:19 PM 1 6.06 0.00 2.02 0.00 0.00 0.00 0.00 0.00 91.92 09:43:19 PM 2 2.04 0.00 0.00 0.00 0.00 0.00 1.02 0.00 96.94 09:43:19 PM 3 2.02 0.00 0.00 0.00 0.00 0.00 1.01 0.00 96.97The system is not seriously stressed given the small file sizes involved. Overall, the CPU is 91.90% idle. We don’t see any %iowait, %sys, or %user activity, so we can assume that almost all of the CPU time is spent running the AWS CLI commands and handling file metadata.
6. Finally, let’s use the
aws s3 lscommand to list the files we moved to Amazon S3 and get a count to confirm that the copy was successful:$ aws s3 ls --recursive s3://test_bucket/test_smallfiles/ | wc -l 53248This is the expected result: 53,248 files were uploaded, which matches the local count in step 2.
Summary:
Example 1 took 20 minutes to move 53,248 files at a rate of 44 files/sec (53,248 files / 1,200 seconds to upload) using 10 parallel streams.
Example 2 – Uploading a small number of large files
In this example we will create five 2-GB files and upload them to Amazon S3. While the previous example stressed operations per second (both on the local system and in operating the
aws s3upload API), this example will stress throughput. Note that while Amazon S3 could store each of these files in a single part, the AWS CLI for Amazon S3 will automatically take advantage of the S3 multipart upload feature. This feature breaks each file into a set of multiple parts and parallelizes the upload of the parts to improve performance.- Create five files filled with 2 GB of pseudo-random content:
$ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=bigfile.% b s=1024k count=2048Since we are writing 10 GB to disk, this command will take some time to run.
- List the files to verify size and number:
$ du -sk . 10485804 $ find . -type f | wc -l 5This is showing that we have 10 GB (10,485,804 KB) of data in 5 files, which matches our goal of creating five files of 2 GB each.
- Copy the files to Amazon S3:
$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_bigfiles/ real 1m48.286s user 1m7.692s sys 0m26.860sNotes:
- Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_bigfiles.
- The destination bucket is s3://test_bucket.
- The destination prefix is test_bigfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.
- We again capture the number of open connections on port 443 while the copy command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:
$ lsof -i tcp:443 | tail -n +2 | wc -l 10Looks like we still have 10 connections open. Even though we only have 5 files, we are breaking each file into multiple parts and uploading them in 10 individual streams.
- Capture the CPU load:
$ mpstat -P ALL 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37) 05/04/2015 _ x86_64_ (4 CPU) <SNIP> 10:35:47 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 10:35:57 PM all 6.30 0.00 3.57 76.51 0.00 0.17 0.75 0.00 12.69 10:35:57 PM 0 8.15 0.00 4.37 75.21 0.00 0.71 1.65 0.00 9.92 10:35:57 PM 1 5.14 0.00 3.20 75.89 0.00 0.00 0.46 0.00 15.31 10:35:57 PM 2 4.56 0.00 2.85 75.17 0.00 0.00 0.46 0.00 16.97 10:35:57 PM 3 7.53 0.00 3.99 79.36 0.00 0.00 0.57 0.00 8.55This is a much more serious piece of work for our instance: We see around 70-80% iowait (where the CPU is sitting idle, waiting for disk I/O) on every core. This hints that we are reaching the limits of our I/O subsystem, but also demonstrates a point to consider: The AWS CLI for Amazon S3, by default and working with large files, is a powerful tool that can really stress a moderately powered system.
6. Check our count of the number of files moved to Amazon S3 to confirm that the copy was successful:
$ aws s3 ls --recursive s3://test_bucket/test_bigfiles/ | wc -l 57. Finally, let’s use the
aws s3api commandto examine the object head metadata on one of the files we uploaded.$ aws s3api head-object --bucket test_bucket --key test_bigfiles/bigfile .1 bytes 2147483648 binary/octet-stream "9d071264694b3a028a22f20 ecb1ec851-256" Thu, 07 May 2015 01:54:19 GMT- The 4th field in the command output is the ETag (opaque identifier), which contains an optional ‘-’ if the object was uploaded with multiple parts. In this case we see that the ETag ends with ‘-256’ indicating that the
s3 cpcommand split the upload into 256 parts. Since all the parts but the last are of the same size, a little math tells us that each part is 8 MB in size.
- The AWS CLI for Amazon S3 is built to optimize upload and download operations while respecting Amazon S3 part sizing rules. The Amazon S3 minimum part size (5 MB, except for the last part which can be smaller), the maximum part size (5 GB), and the maximum number of parts (10,000) are described in theS3 Quick Facts documentation.
Summary:
In example 2, we moved five 2-GB files to Amazon S3 in 10 parallel streams. The operation took 1 minute and 48 seconds. This represents an aggregate data rate of ~758 Mb/s (85,899,706,368 bytes in 108 seconds) – about 80% of the maximum bandwidth available on our host.
Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time
In this example, we will keep the contents of a local directory synchronized with an Amazon S3 bucket using the
aws s3 synccommand. The rulesaws s3 syncwill follow when deciding when to copy a file are as follows: “A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.” See the command reference for more information about these rules and additional arguments available to modify these behaviors.This example will use multipart upload and parallel upload threads.
- Let’s make our example files a bit more complicated and use a mix of file sizes (warning: inelegant hackery imminent):
> $ i=1; while [[ $i -le 132000 ]]; do num=$((8192*4/$i)) [[ $num -ge 1 ]] || num=1 mkdir randfiles/$i seq -w 1 $num | xargs -n1 -P 256 -I % dd if=/dev/urandom of=r andfiles/$i/file_$i.% bs=16k count=$i; i=$(($i*2)) done- Check our work by getting file sizes and file counts:
$ du -sh randfiles/ 12G randfiles/ $ find ./randfiles/ -type f | wc -l 65537So we have 65537 files totaling 12 GB in size, to sync.
- Upload to Amazon S3 using the
aws s3 synccommand:
$ timeaws s3 sync--quiet . s3://test_bucket/test_randfiles/ real 26m41.194s user 10m7.688s sys 2m17.592sNotes:
- Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_randfiles/.
- The destination bucket is s3://test_bucket.
- The destination prefix is test_randfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.
- We again capture the number of open connections while the sync command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:
$ lsof -i tcp:443 | tail -n +2 | wc -l 10- Let’s check the CPU load. We are only showing one sample interval, but the load will vary much more than the other runs as the AWS CLI for Amazon S3 deals with various files of varying file sizes:
$ mpstat -P ALL 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37) 05/07/2015 _ x86_64_ (4 CPU) 03:08:50 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 03:09:00 AM all 6.23 0.00 1.70 1.93 0.00 0.08 0.31 0.00 89.75 03:09:00 AM 0 14.62 0.00 3.12 2.62 0.00 0.30 0.30 0.00 79.03 03:09:00 AM 1 3.15 0.00 1.22 0.41 0.00 0.00 0.31 0.00 94.91 03:09:00 AM 2 3.06 0.00 1.02 0.31 0.00 0.00 0.20 0.00 95.41 03:09:00 AM 3 4.00 0.00 1.54 4.41 0.00 0.00 0.31 0.00 89.74- Let’s run a quick count to verify that the synchronization is complete:
$ aws s3 ls --recursive s3://test_bucket/test_randfiles/ | wc -l 65537Looks like all the files have been copied!
- Now we’ll make some changes to our source directory:
With this command we are touching eight existing files to update the modification time (mtime) and creating a directory containing five new files.
$ touch 4096/* $ mkdir 5_more $ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=5_more/5 _more% bs=1024k count=5 $ find . –type f -mmin -10 . ./4096/file_4096.8 ./4096/file_4096.5 ./4096/file_4096.3 ./4096/file_4096.6 ./4096/file_4096.4 ./4096/file_4096.1 ./4096/file_4096.7 ./4096/file_4096.2 ./5_more/5_more1 ./5_more/5_more4 ./5_more/5_more2 ./5_more/5_more3 ./5_more/5_more5- Rerun the
synccommand. This will compare the source and destination files and upload any changed files to Amazon S3:
$ time aws s3 sync . s3://test_bucket/test_randfiles/ upload: 4096/file_4096.1 to s3://test_bucket/test_randfiles/4096/file_4096.1 upload: 4096/file_4096.2 to s3://test_bucket/test_randfiles/4096/file_4096.2 upload: 4096/file_4096.3 to s3://test_bucket/test_randfiles/4096/file_4096.3 upload: 4096/file_4096.4 to s3://test_bucket/test_randfiles/4096/file_4096.4 upload: 4096/file_4096.5 to s3://test_bucket/test_randfiles/4096/file_4096.5 upload: 4096/file_4096.6 to s3://test_bucket/test_randfiles/4096/file_4096.6 upload: 4096/file_4096.7 to s3://test_bucket/test_randfiles/4096/file_4096.7 upload: 5_more/5_more3 to s3://test_bucket/test_randfiles/5_more/5_more3 upload: 5_more/5_more5 to s3://test_bucket/test_randfiles/5_more/5_more5 upload: 5_more/5_more4 to s3://test_bucket/test_randfiles/5_more/5_more4 upload: 5_more/5_more2 to s3://test_bucket/test_randfiles/5_more/5_more2 upload: 5_more/5_more1 to s3://test_bucket/test_randfiles/5_more/5_more1 upload: 4096/file_4096.8 to s3://test_bucket/test_randfiles/4096/file_409 6.8 real 1m3.449s user 0m31.156s sys 0m3.620sNotice that only the touched and new files were transferred to Amazon S3.
Summary:
This example shows the result of running the sync command to keep local and remote Amazon S3 locations synchronized over time. Synchronizing can be much faster than creating a new copy of the data in many cases.
Example 4 – Maximizing throughput
When you’re transferring data to Amazon S3, you might want to do more or go faster than we’ve shown in the three previous examples. However, there’s no need to look for another tool—there is a lot more you can do with the AWS CLI to achieve maximum data transfer rates. In our final example, we will demonstrate running multiple commands in parallel to maximize throughput.
In the first example we uploaded a large number of small files and achieved a rate of 44 files/sec. Let’s see if we can do better. What we are going to do is string together a few additional Linux commands to help influence how the
aws s3 cpcommand runs.- Launch 26 copies of the
aws s3 cpcommand, one per directory:
$ time ( find smallfiles -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --quiet {}/ s3://test_bucket/{}/ ) real 2m27.878s user 8m58.352s sys 0m44.572s-
Note how much faster this completed compared with our original example which took 20 minutes to run.
Notes:
- The
findpart of the above command passes a null-terminated list of subdirectories to the ‘smallfiles’ directory toxargs. xargslaunches up to 30 parallel (‘-P30’) invocations ofaws s3 cp. Only 26 are actually launched based on the output of the find.xargsreplaces the ‘{}’ argument in theaws s3 cpcommand with the file name passed from the output of thefindcommand.- The destination here is s3://test_bucket/smallfiles/, which is slightly different from example 1.
- Note the number of open connections
$ lsof -i tcp:443 | tail -n +2 | wc -l 260We see 10 connections for each of the 26 invocations of the
s3 cpcommand.- Let’s check system load:
$ mpstat -P ALL 10 Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37) 05/07/2015 _ x86_64_ (4 CPU) 07:02:49 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 07:02:59 PM all 91.18 0.00 5.67 0.00 0.00 1.85 0.00 0.00 1.30 07:02:59 PM 0 85.30 0.00 6.50 0.00 0.00 7.30 0.00 0.00 0.90 07:02:59 PM 1 92.61 0.00 5.79 0.00 0.00 0.00 0.00 0.00 1.60 07:02:59 PM 2 93.60 0.00 5.10 0.00 0.00 0.00 0.00 0.00 1.30 07:02:59 PM 3 93.49 0.00 5.21 0.00 0.00 0.00 0.00 0.00 1.30The server is finally doing some useful work! Since almost all the time is spent in %user with very little %idle or %iowait, we know that the CPU is working hard on application logic without much constraint from the storage or network subsystems. It’s likely that moving to a larger host with more CPU power would speed this process up even more.
- Verify the file count:
$ aws s3 ls --recursive s3://test_bucket/smallfiles 53248Summary:
Using 26 invocations of the command improved the execution time by a factor of 8: 2 minutes 27 seconds for 53,248 files vs. the original run time of 20 minutes. The file upload rate improved from 44 files/sec to 362 files/sec.
The application of similar logic to further parallelize our large file scenario in example 2 would easily saturate the network bandwidth on the host. Be careful when executing these examples! A well-connected host can easily overwhelm the Internet links at your source site!
Conclusion
In this post we demonstrated the use of the AWS CLI for common Amazon S3 workflows. We saw that the AWS CLI for Amazon S3 scaled to 10 parallel streams and enabled multipart uploads automatically. We also demonstrated how to accelerate the tasks with further parallelization by using common Linux CLI tools and techniques.
When using the AWS CLI for Amazon S3 to upload files to Amazon S3 from a single instance, your limiting factors are generally going to be end-to-end bandwidth to the AWS S3 endpoint for large file transfers and host CPU when sending many small files. Depending on your particular environment, your results might be different from our example results. As demonstrated in example 4, there may be an opportunity to go faster if you have the resources to support it. AWS also provides a variety of Amazon EC2 instance types, some of which might provide better results than the m3.xlarge instance type we used in our examples. Finally, networking bandwidth to the public Amazon S3 endpoint is a key consideration for overall performance.
We hope that this post helps illustrate how powerful the AWS CLI can be when working with Amazon S3, but this is just a small part of the story: the AWS CLI can launch Amazon EC2 instances, create new Amazon VPC’s and enable many of the other features of the AWS platform with just as much power and flexibility as it can for Amazon S3. Have fun exploring!

