AWS AI Blog
Join AWS User Group Dublin for an Evening of AI & Deep Learning
Join Julien Simon, Principal Technical Evangelist at Amazon Web Services, on May 9 for an evening of AI and Deep Learning hosted by the AWS User Group Dublin. The event will feature Amazon Lex, Amazon Polly, and Amazon Rekognition. Julien will take participants on a journey through Deep Learning with AWS covering AI theory to the latest offerings from AWS. Additional speakers will dive deep on Amazon Lex and the new Alexa Skills Kit. If youāre in Dublin on May 9, we hope that you can join us!
For more information, see the AWS User Group Dublinās Meetup invitation.
Create Audiobooks with Amazon Polly and AWS Batch
Amazon Polly, one of AWSās first AI services, turns text into lifelike speech. By enabling applications to speak, Amazon Polly makes it possible to develop new types of speech-enabled products. For example, many AWS customers have large documents, such as books or reports, that theyād like to convert to speech so they can listen to them when commuting. Others prefer to use audio to consume most written content.
Amazon Polly has two limitations that present challenges for large text-to-speech applications:
- The maximum size of input text for the SynthesizeSpeech API is 1500 billed characters.
- The maximum number of concurrent requests to the SynthesizeSpeech API per second is 80, with a burst limit of 100.
This post describes the polly-batch-processor application, which overcomes the challenge of processing a text document that exceeds the maximum number of characters supported by Amazon Polly. Polly-batch-processor takes a large text document, breaks it into chunks, generates an audio file for each chunk with Amazon Polly, and consolidates the chunks into a single large MP3 file. AWS Batch asynchronously processes the audio document. To jump directly to the application and configuration steps, click here.
Polly-batch-processor also works for documents that contain many short prompts that are less than 1500 characters; for example, a document containing many short phrases, such as movie titles, that you want to synthesize into a single audio file. Polly-batch-processor generates each sentence asynchronously and in parallel, reducing the time to create the audio file and overcoming any throttling issues with Amazon Polly.
How it works
The following figure shows the application workflow:

Changing Lives with AI: Pollexy (āPollyā + āLexā), A Special Needs Verbal Assistant
| Listen to this post Voiced by Polly |
With the emergence of devices like the Amazon Echo and AWS services like Amazon Polly and Amazon Lex, itās easier to develop integrated voice solutions that can simplify life and make it more enjoyable. In some cases, the resulting innovations can lead to life-changing experiences because the technology facilitates communication breakthroughs that surpass making some experiences more convenient or entertaining.
Case in point: Pollexy (āPollyā + āLexā), a special needs verbal assistant. Pollexy is a mobile application that runs on Raspberry Pi and was developed by Troy Larson. Pollexy enables caretakers to trigger audio messages on a recurring schedule and on demand using the Amazon Echo. For a special needs person, such as Troyās son Calvin, Pollexy provides not only spoken support and guidance, but also the āconfidence, respect and the sense of privacy and freedom that we all want to enjoyā.
If you missed it when it was originally published, please take a few minutes to read Troy Larsonās full blog post about Pollexy. Itās a wonderful and moving example of how advances in technology can make a meaningful difference in our lives.
āIn the Research Spotlightā: A New Blog Post Series
In addition to the continued contributions to the AI community through Apache MXNet and the Amazon AI services of Amazon Lex, Amazon Polly, and Amazon Rekognition, AWS also has an active team of AI researchers. These researchers have rich and interesting backgrounds in machine learning and deep learning, from academia, startups, and enterprises, all with one primary mission: to lower the barrier to AI for all AWS developers by making it more accessible and easier to use. As Swami Sivasubramanian, VP of Machine Learning at AWS, succinctly stated in a recent article in Silicon Angle, āOur goal is to basically democratize artificial intelligence, to make AI accessible to every developer.ā
In this Research Spotlight series, I sit down with many of these researchers for in-depth conversations about their past experiences and to provide a peek into what they are working on now at AWS. Weāre excited to have such a strong group of AI experts joining AWS, including the following recent hires:
Alex Smola joined AWS last July as the Director of Machine Learning and Deep Learning. He is an active member of the academic research community, authoring or contributing to 462 titles with 75,000+ citations, frequently speaking on the topic of Apache MXNet and how to design efficient algorithms at scale.
Anima Anandkumar joined Amazon Web Services in November 2016, as Principal Scientist on Deep Learning, currently on leave from the EECS Department at UC Irvine, where she has been an associate professor since August 2010. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics.
Hassan Sawaf is our Director of Applied Science & Artificial Intelligence. From DARPA to eBay and now AWS, Hassan has been working in the fields of Automatic Speech Recognition, Computer Vision, Natural Language Understanding, and Machine Translation for over 20 years.
Mu Li is a senior applied scientist for machine learning at Amazon Web Services. Before joining Amazon, he was the CTO of Marianas Labs, an Artificial Intelligence startup. He also served as a principal research architect at the Institute of Deep Learning at Baidu.
Edo Liberty is a Principal Scientist and the manager of the Amazon AI Algorithms group. Edo and his group tackle some of the most interesting problems in machine learning, data science, and scalable systems.
Iāll continue to highlight members of this group in future blog posts. Watch this space!
Deep Learning on AWS at NVIDIAās GPU Technology Conference, GTC 2017
This year at NVIDIAās GPU Technology Conference, AWS is hosting several tech sessions ranging from how to get started with Apache MXNet to running deep learning on IoT devices on the edge. If youāre in Silicon Valley the week of May 8, we hope that youāll join us for the following sessions.
An Introduction to Using Apache MXNet (S7853) | 4 hours
Get hands-on experience using Apache MXNet with the preconfigured AWS Deep Learning AMIs and AWS CloudFormation template to speed development and quickly spin up AWS GPU clusters to train at record speed. This course will include the following:
- Background on deep learning
- An overview of how to set up AMIs, AWS CloudFormation templates, and other deep learning frameworks on AWS
- A peek under the MXNet hood (MXNet internals) and a comparison with other deep learning frameworks
- Hands-on training with Apache MXNet: NDArrays, Symbols, and the mechanics of training deep neural networks
- More hands-on training with Apache MXNet: application examples, using Jupyter notebooks, and targeting computer vision and natural language processing
Getting Started with Apache MXNet (S7565) | 50 minutes
Deep learning continues to push the state of the art in domains such as computer vision, natural language understanding, and recommendation engines. A key reason for this progress is the availability of highly flexible and developer-friendly deep learning frameworks. During this session, members of the AWS Deep Learning product team provide background on deep learning and how itās applied at AWS, and the strategy for investing in the Apache MXNet project. Youāll also learn how to get started using NVIDIA GPUs in the AWS Cloud, which lets you easily scale to hundreds of GPUs in minutes.
High-Performance Deep Learning on Embedded Devices Using Apache MXNet (S7571) | 50 minutes
Learn how to compile and run an optimized version of the Apache MXNet deep learning framework for various embedded (IoT) devices. Also learn about the wide range of exciting applications running deep network inference in near real time on āedgeā devices. To demonstrate the massive efficiency gains that Apache MXNet yields over comparable frameworks on embedded devices, we show performance numbers for a variety of deep learning models running on Raspberry Pis and TK1 processors. We then demo the power of real-time image processing with deep learning models by walking through an example application. Finally, we demonstrate how to use AWS IoT to significantly augment the flexibility and reliability of the models running in our example application.
Fast CNN Tuning with AWS GPU Instances and SigOpt
By Steven Tartakovsky, Michael McCourt, and Scott Clark of SigOpt
Compared with traditional machine learning models, neural networks are computationally more complex and introduce many additional parameters. This often prevents machine learning engineers and data scientists from getting the best performance from their models. In some cases, it might even dissuade data scientists from using neural networks.
In this post, we show how to tune a Convolutional Neural Network (CNN) for a Natural Language Processing (NLP) task 400 times faster than with traditional random search on a CPU. Additionally, this method also achieves greater accuracy. We accomplish this by using the combined power of SigOpt and NVIDIA GPUs on AWS. To replicate the technical portions of this post, use the associated instructions and code on GitHub.
How MXNet, GPU-enabled AWS P2 instances, and SigOpt work
MXNet is a deep learning framework that machine learning engineers and data scientists can use to quickly create sophisticated deep learning models. MXNet makes it easy to use NVIDIA GPU-enabled AWS P2 instances, which significantly speed up training neural network models. In our example, we observed a 50x decrease in training time compared to training on a CPU. This reduces the average time to train the neural network in this example from 2 hours to less than 3 minutes!
In complex machine learning models and data processing pipelines, like the NLP CNN described in this post, many parameters determine how effective a predictive model will be. Choosing these parameters, fitting the model, and determining how well the model performs is a time-consuming, trial-and-error process called hyperparameter optimization or, more generally, model tuning. Black-box optimization tools like SigOpt increase the efficiency of hyperparameter optimization without introspecting the underlying model or data. SigOpt wraps the underlying pipeline and optimizes the parameters to maximize some metric, such as accuracy.
Although you need domain expertise to prepare data, generate features, and select metrics, you donāt need special knowledge of the problem domain for hyperparameter tuning. SigOpt can significantly speed up and reduce the cost of this tuning step compared to standard hyperparameter tuning approaches like random search and grid search. In our example, SigOpt is able to achieve better results with 10x fewer model trainings compared to random search. Combined with the decreased training time from using NVIDIA GPU-enabled AWS P2 instances this results in a total speed up in model tuning of over 400x.

What we did
To show how these tools can get you faster results, we ran them on a sentiment analysis task. We used an open dataset of 10,622 labeled movie reviews from Rotten Tomatoes to predict whether the review is positive (4 or 5) or negative (1 or 2).
We performed the following tasks:
- Randomly split the data into a training set (9,662 reviews) and a validation set (1,000 reviews).
- Embedded the vocabulary of the entire dataset (as word2vec does).
- Trained a CNN using a specific architecture and set of hyperparameters.
- Evaluated the predictive performance of the model on the validation set.
In a production setting, a more robust workflow is critical to avoid overfitting of hyperparameters. Cross-validation and adding Gaussian noise to your dataset are some common techniques for avoiding overfitting to any one dataset. To focus only on hyperparameter optimization, we keep the training and validation sets fixed. For best practices for parameter optimization, see this blog.
Amazon and Facebook Collaborate to Optimize Caffe2 for the AWS Cloud
From Apache MXNet to Torch, there is no shortage of frameworks for deep learners to leverage. The various offerings each excel at different aspects of the deep learning pipeline and each meets different developer needs. The research-centric community tends to gravitate toward frameworks such as Theano, Torch and most recently PyTorch, while many in the industry have Caffe, TensorFlow or Apache MXNet deployed at scale for production applications. Given the heterogeneity in usage and users, AWS supports a range of frameworks as part of its developer tool offerings and, as a result, supports a broad spectrum of users.
AWS provides an open environment for developers to conduct deep learning. As we announced on April 18th, we are excited to further increase developer choice by offering support for Facebookās newly launched Caffe2 project in the Ubuntu version of the AWS Deep Learning AMI (and coming soon in the Amazon Linux version, too).
What is Caffe2?
Caffe2āarchitected by Yangqing Jia, the original developer of Caffeāis a lightweight, modular, and scalable deep learning framework. Facebook deployed Caffe2 internally to help researchers train large machine learning models and deliver AI on mobile devices.
Now, all developers have access to many of the same tools for running large-scale distributed training and building machine learning applications for mobile. This allows the machine learning community to rapidly experiment with more complex models and deploy machine learning applications and services for mobile scenarios.
Caffe2 features include:
- Easy implementation of a variety of models, including CNNs (convolutional neural networks), RNNs (recurrent neural networks), and conventional MLPs (multi-layer perceptrons)
- Native distributed training interfaces
- Mixed-precision and reduced-precision computations
- Graph-based computation patterns that facilitate easy heterogeneous computation across multiple devices
- Modularity, allowing the addition of custom recipes and hardware without risking codebase collisions
- Strong support for mobile and embedded platforms in addition to conventional desktops and server environments
Why āyet anotherā deep learning framework?
The original Caffe framework, with unparalleled performance and a well-tested C++ codebase, is useful for large-scale conventional CNN applications. However, as new computation patterns emergeāespecially distributed computation, mobile, reduced precision computation, and more non-vision use casesāCaffeās design limitations became apparent.
By early 2016, the Facebook team had developed an early version of Caffe2 that improved Caffe by implementing a modern computation graph design, minimalist modularity, and the flexibility to easily port to multiple platforms. In the last year, Facebook has fully embraced Caffe2 as a multipurpose, deep learning framework, and has begun using it in Facebook products.
The Facebook team is very excited about Caffe2ās ability to support a wide range of machine learning use cases, and is equally excited to contribute Caffe2 to the open source community. The teamās also looking forward to working with partners like AWS and the open source software community to push the state-of-the-art in machine learning systems.
Running BigDL, Deep Learning for Apache Spark, on AWS
In recent years, deep learning has significantly improved several AI applications, such as recommendation engines, voice and speech recognition, and image and video recognition. Many customers process the massive amounts of data that feed these deep neural networks in Apache Spark, only to later feed it into a separate infrastructure to train models using popular frameworks, such as Apache MXNet and TensorFlow. Because of the popularity of Apache Spark and contributors that exceed a thousand, the developer community has expressed interest in uniting the big data infrastructure and deep learning into a single workflow under Apache Spark.
Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeleyās AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which maintains it. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.
BigDL is a distributed deep learning framework for Apache Spark that was developed by Intel and contributed to the open source community for the purposes of uniting big data processing and deep learning. BigDL helps make deep learning more accessible to the big data community by allowing developers to continue using familiar tools and infrastructure to build deep learning applications. BigDL is licensed under the Apache 2.0 license.
As the following diagram shows, BigDL is implemented as a library on top of Spark, so that users can write their deep learning applications as standard Spark programs. As a result, BigDL can be seamlessly integrated with other libraries on top of SparkāSpark SQL and DataFrames, Spark ML pipelines, Spark Streaming, Structured Streaming, etc.āand can run directly on top of existing Spark or Hadoop clusters.

Deep Learning AMI for Ubuntu v1.3_Apr2017 Now Supports Caffe2
We are excited to announce that the AWS Deep Learning AMI for Ubuntu now supports the newly launched Caffe2 project led by Facebook. AWS is the best and most open place for developers to run deep learning, and the addition of Caffe2 adds yet another choice. To learn more about Caffe2, check out the the Caffe2 developer site or the GitHub repository.

The Deep Learning AMI v1.3_Apr2017 for Ubuntu provides a stable, secure, and high-performance execution environment for deep learning applications running on Amazon EC2. This AMI includes the following framework versions:
- MXNet v0.9.3
- Caffe2 v0.6.0 (new)
- TensorFlow v1.0.1 (updated)
- Caffe rc5
- Theano rel-0.8.2
- Keras 1.2.2
- CNTK v2.0 RC1 (updated)
- Torch master branch
AI Tech Talk: An Overview of AI on the AWS Platform

AWS offers a family of intelligent services that provide cloud-native machine learning and deep learning technologies to address your different use cases and needs. For developers looking to add managed AI services to their applications, AWS brings natural language understanding (NLU) and automatic speech recognition (ASR) with Amazon Lex, visual search and image recognition with Amazon Rekognition, text-to-speech (TTS) with Amazon Polly, and developer-focused machine learning with Amazon Machine Learning.
For more in-depth deep learning applications, the AWS Deep Learning AMI lets you run deep learning in the cloud, at any scale. Launch instances of the AMI, pre-installed with open source deep learning frameworks (Apache MXNet, TensorFlow, Caffe, Theano, Torch and Keras), to train sophisticated, custom AI models, experiment with new algorithms, and learn new deep learning skills and techniques; all backed by auto-scaling clusters of GPU-based instances.
Whether youāre just getting started with AI or youāre a deep learning expert, this session will provide a meaningful overview of the managed AI services, the AI Platform offerings, and the AI Frameworks you can run on the AWS Cloud.


