AWS Database Blog

How to Build a Chat Application with Amazon ElastiCache for Redis

by Sam Dengler | on | in ElastiCache, Redis | | Comments

Sam Dengler is a Solutions Architect at Amazon Web Services

In this blog post, we review concepts and architectural patterns relevant to a chat application. We also discuss implementation details for a chat client and server, and instructions to deploy a sample chat application into your AWS account.

Background information

Building a chat application requires a communication channel over which a client can send messages that are redistributed to other participants in the chat room. This communication is popularly implemented using the publish-subscribe pattern (PubSub), where a message is sent to a centralized topic channel. Interested parties can subscribe to this channel to be notified of updates. This pattern decouples the publisher and subscribers, so that the set of subscribers can grow or shrink without the knowledge of the publisher.

PubSub is implemented on a backend server, to which clients communicate using WebSockets. WebSockets is a persistent TCP connection that provides a channel for data to be streamed bidirectionally between the client and server. With a single-server architecture, one PubSub application can manage the state of publishers and subscribers, and also the message redistribution to clients over WebSockets. The diagram following illustrates the path that messages travel over WebSockets between two clients on a single-server PubSub architecture.

Single-Server PubSub Architecture

(more…)

Migrating Oracle Database from On-Premises or Amazon EC2 Instances to Amazon Redshift

by Ballu Singh and Pubali Sen | on | in DMS, Migration, Redshift, Schema Conversion Tool (SCT) | | Comments

Ballu Singh and Pubali Sen are solutions architects at Amazon Web Services.

AWS Database Migration Service (AWS DMS) helps you migrate databases to AWS easily and securely. The AWS Database Migration Service can migrate your data to and from most widely used commercial and open-source databases. The service supports homogenous migrations such as Oracle to Oracle. It also supports heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to MySQL. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.

Data replication with AWS Database Migration Service integrates tightly with the AWS Schema Conversion Tool (AWS SCT), simplifying heterogeneous database migration projects. You can use AWS SCT for heterogeneous migrations. You can use the schema export tools native to the source engine for homogenous migrations.

In this blog post, we focus on migrating the data from Oracle Data Warehouse to Amazon Redshift.

In the past, AWS SCT couldn’t convert custom code, such as views and functions, from Oracle Data Warehouse to a format compatible with the Amazon Redshift. To migrate views and functions, you had to first convert the Oracle Data Warehouse schema to PostgreSQL. Then you’d apply a script to extract views and functions that are compatible with Amazon Redshift.

After an update based on customer feedback, we’re happy to let you know that with AWS SCT and AWS DMS, you can now migrate Oracle Data Warehouse to Amazon Redshift along with views and functions.

The following diagram illustrates the migration process.

(more…)

How to Stream Data from Amazon DynamoDB to Amazon Aurora using AWS Lambda and Amazon Kinesis Firehose

by Aravind Kodandaramaiah | on | in Aurora, DynamoDB, Kinesis, Lambda | | Comments

Aravind Kodandaramaiah is a partner solutions architect with the AWS Partner Program

Introduction

We find that customers running AWS workloads often use both Amazon DynamoDB and Amazon Aurora. Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. Its flexible data model and reliable performance make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora provides up to five times better performance than MySQL with the security, availability, and reliability of a commercial database at one-tenth the cost.

To put these together, imagine you have built a custom web analytics engine, with millions of web clicks registered within DynamoDB every second. Amazon DynamoDB operates at this scale and can ingest high-velocity data. Now imagine needing to replicate this clickstream data into a relational database management system (RDBMS), such as Amazon Aurora. Suppose that you want to slice and dice this data, project it in various ways, or use it for other transactional purposes using the power of SQL within stored procedures or functions.

To effectively replicate data from DynamoDB to Aurora, a reliable, scalable data replication (ETL) process needs to be built. In this post, I show you how to build such a process using a serverless architecture with AWS Lambda and Amazon Kinesis Firehose.

Solution overview

The following diagram shows the solution architecture. The motivations behind this architecture are the following:

  1. Serverless – By offloading infrastructure management to AWS, you achieve zero-maintenance infrastructure. You also simplify security management for the solution, because there is no need to use keys or passwords, and you optimize cost. In addition, you automate scaling with concurrent Lambda function executions based on shard iterators in DynamoDB Streams.
  2. Ability to retry failures – Because the data movement process needs to be highly reliable, the process needs to handle failures in each step and provide an ability to retry. This architecture does that.
  3. Optimization of concurrent database connections – By buffering records based on interval or buffer size, you can reduce the number of concurrent connections to Amazon Aurora. This approach helps avoid connection timeouts.
  4. Separation of concerns – Using AWS Lambda, you can separate each concern of the data replication process. For example, you can separate the extract phase as processing DynamoDB streams, the transform phase as Firehose-Lambda transformation, and the load phase as bulk insert into Aurora.

(more…)

Events and Notifications in AWS Database Migration Service

by Eran Schitzer | on | in DMS | | Comments

Eran Schitzer is a product manager at Amazon Web Services.

We’ve recently added a new feature in AWS Database Migration Service (AWS DMS)—the ability to receive DMS events notifications, such as email messages, text messages, or calls to HTTP endpoints, through Amazon Simple Notification Service (Amazon SNS).

You now can subscribe and receive notifications for two types of events—events related to DMS instances and events related to replication tasks. Events related to DMS instances include those for availability, configuration change, creation, deletion, and maintenance. For example, when a DMS instance goes down for maintenance, a notification is triggered.

Events related to replication tasks include those such as start, pause, finish, Full Load completed, CDC started, and many more. For example, when a migration task finishes to migrate the entire data, it will trigger a ā€œFull Load completedā€ notification. If the task is configured to follow Full Load mode with CDC mode (that is, replicate the changes in the data since the Full Load began), a ā€œCDC startedā€ notification is triggered next.

In addition, AWS DMS groups events into categories that you can subscribe to using the AWS DMS console or the AWS DMS API. This subscription means you can be notified when an event occurs in the category you subscribed to. For example, if you subscribe to the creation category for a given replication instance, you are notified whenever a creation-related event occurs that affects your replication instance, such as a replication instance is being created.

The following list represents the possible categories for subscription for the DMS replication instance at this time:

  • Configuration change—a replication instance configuration is being changed
  • Creation—a replication instance is being created
  • Deletion—a replication instance is being deleted
  • Maintenance—offline maintenance of replication instance is taking place
  • Low storage—the free storage for the replication instance
  • Failover—failover for a Multi-AZ instance, when enabled, has begun or finished
  • Failure—the replication instance has gone into storage failure or has failed due to incompatible network

The following list represents the possible categories for subscription for the DMS replication task at this time:

  • State change—the replication task has started or stopped
  • Creation—the replication task has being created
  • Deletion—the replication task has been deleted
  • Failure—the replication task has failed

For a list of the events and event categories provided by AWS DMS, see AWS DMS Event Categories and Event Messages in the documentation.

To subscribe to AWS DMS events, do the following:

  1. Create an Amazon SNS topic. In the topic, you specify what type of notification you want to receive and what address or number the notification will go to.
  2. Create an AWS DMS event notification subscription using the AWS Management Console, AWS CLI, or AWS DMS API.
  3. When you receive an AWS DMS approval email or SMS message to the address you submitted with your subscription, click the link in the approval email or SMS message to confirm your subscription.

When you have confirmed the subscription, the status of your subscription is updated in the AWS DMS console’s Event Subscriptions section.

You then begin to receive event notifications.

For more information about table mapping using the console, see the DMS documentation.

For more information about AWS Database Migration Service in general, see our website.

Send Apache Web Logs to Amazon Elasticsearch Service with Kinesis Firehose

by Jon Handler | on | in Elasticsearch, Kinesis | | Comments

Jon Handler (@_searchgeek) is an AWS solutions architect specializing in search technologies.

We have many customers who own and operate Elasticsearch, Logstash, and Kibana (ELK) stacks to load and visualize Apache web logs, among other log types. Amazon Elasticsearch Service provides Elasticsearch and Kibana in the AWS Cloud in a way that’s easy to set up and operate. Amazon Kinesis Firehose provides reliable, serverless delivery of Apache web logs (or other log data) to Amazon Elasticsearch Service.

With Firehose, you can add an automatic call to an AWS Lambda function to transform records within Firehose. With these two technologies, you have an effective, easy-to-manage replacement for your existing ELK stack.

 

In this post, we show you first how to set up an Amazon Elasticsearch Service domain. Then we show how to create and connect a Firehose stream that employs a prebuilt Lambda function to parse Apache web logs. Finally, we show how to load data with Amazon Kinesis Agent and visualize with Kibana.

(more…)

How to Configure a Private Network Environment for Amazon DynamoDB Using VPC Endpoints

by Sangpill Kim and Gisung Lim | on | in DynamoDB | | Comments

Gisung Lim is a security solutions architect at Amazon Web Services Korea and Sangpill Kim is an enterprise solutions architect at Amazon Web Services Korea.

This blog post explains how to enhance the privacy and security of data transfers between Amazon DynamoDB and your corporate network using the new Amazon VPC Endpoints for DynamoDB (currently in public preview). With VPC Endpoints for DynamoDB, you can access your DynamoDB tables using private connection endpoints from within your VPC. We also explore how to prevent access to your data from unauthorized locations by using VPC endpoints (for example, preventing use of the AWS Management Console to access your DynamoDB tables from outside of your company network). Although we don’t talk about this in this blog post, you can also use VPC Endpoints for DynamoDB to help resolve regulatory issues regarding authorization and auditability of DynamoDB for confidential user data.

VPC Endpoints for DynamoDB enables Amazon EC2 instances in your VPC to access DynamoDB using their private IP addresses, without any exposure to the public Internet. This new DynamoDB feature ensures that traffic between your VPC and DynamoDB doesn’t leave the Amazon network. In this configuration, your EC2 instances don’t require public IP addresses, and you don’t need an Internet gateway, a NAT device, or a virtual private gateway in your VPC. Furthermore, you can use endpoint policies to control access to VPC endpoints.

Solution overview
Let’s assume that developers’ and administrators’ PCs in your private corporate network don’t have public Internet connectivity, and that you are using either a virtual private network (VPN) connection or AWS Direct Connect to connect between your corporate network and your VPC. This setup helps mitigate the risk of losing or disclosing personally identifiable information (PII) from the PCs. You can use a VPN to route all DynamoDB network traffic through your corporate network infrastructure to help address concerns about the privacy and security of data transfers. However, using a VPN can introduce bandwidth and availability challenges.

To resolve those challenges, we propose a new architecture based on VPC Endpoints for DynamoDB. First, we define the following objectives for our new design:

  • All access to DynamoDB should be internal private communications, not using the public Internet.
  • Access to DynamoDB using the AWS Management Console should be prohibited.
  • All access to DynamoDB should be restricted to the permitted locations or devices, and should be logged.

To satisfy our objectives, we define four control factors, shown in the following architecture diagram.

Arch2

(more…)

Preventing Accidental Table Deletion in DynamoDB

by Edin Zulich | on | in DynamoDB | | Comments

Edin Zulich, AWS NoSQL solutions architect

It’s easy to delete a table in Amazon DynamoDB, and that means that it’s easy to delete one by accident, too. Fortunately, you can minimize the risk of accidentally deleting a table using AWS Identity and Access Management (IAM), which provides authentication and access control for DynamoDB.

As a managed service, DynamoDB is fully integrated with IAM. You can use IAM to tailor access control to resources and operations in DynamoDB. Using IAM roles, policies, and groups, you can implement a security configuration to ensure that DynamoDB resources are only accessed and modified the way you want. This approach includes preventing accidental table deletion.

For our purposes—preventing accidental table deletion—we will use IAM roles to control access to the DynamoDB DeleteTable operation. This approach requires the user to take an extra step to delete a table: Switch to a special IAM role. In other words, users won’t be able to delete a table by simply invoking the DeleteTable operation. They will have to assume a different IAM role first. This step will prevent accidental deletion of a table.

In this blog post, we show how to disable the DeleteTable operation, create a special ā€œdelete tableā€ role, attach it to an IAM group that should have DeleteTable permissions, and then use the role. We also show how to add a requirement to the role to use multi-factor authentication (MFA), and how to prevent role switching in the console.

(more…)

Wave: A Private Location App Running on Amazon RDS

by Yoav Eilat | on | in RDS PostgreSQL | | Comments

By Pablo Clemente, CTO, Wave

How many times have you endured the painful process of meeting up with someone and having to constantly text or call them to find out where exactly they are? Doing this is not only inefficient and frustrating, but also a potential danger for anyone who is driving or moving. Recently, several smartphone apps have appeared that are designed to locate your contacts, such as Facebook’s Find My Friends and Google’s Latitude. The Wave app locates your contacts and improves on earlier solutions in a few ways.

 

(more…)

Eduphoria Uses Amazon Aurora to Give Educators Superpowers

by Sirish Chandrasekaran | on | in Aurora | | Comments

Eduphoria is making great use of Amazon Aurora and other AWS database services. In this guest post, you can learn about how they use Amazon Aurora to scale their read workload while saving costs. This post was written by Aaron Dulaney, director of infrastructure operations at Eduphoria.

At Eduphoria, we provide K–12 educators the tools that they need to become superheroes within the classroom. Our integrated apps assist in every aspect of the school day, from lesson planning to monitoring student progress, to streamlining administrative duties, to providing a collaborative platform for education professionals. In short, we’ve got teachers’ backs—and more importantly, Amazon Aurora has ours!

When Eduphoria first started, we had 30 on-premises servers, each running its own instance of Microsoft Internet Information Services (IIS) and MySQL. As we grew, we needed to step back and rethink our infrastructure: first, to remove single points of failure; and second, to get better monitoring and performance insights on that infrastructure.

The decision to choose AWS was easy, because AWS met both those needs. Databases, applications, session states, and storage services are all fault-tolerant with automatic failover or scaling. The insight we have into our infrastructure is far better than we could ever provide for on-premises installs.

In addition, we chose Amazon EC2 for our front-end web server architecture. We chose it because it can easily be scaled and spread across multiple data centers with very minimal configuration. It is extremely flexible, and we love that we only pay for the capacity we need, when needed.

For our database layer, we first chose Amazon RDS for MySQL in a high availability (Multi-AZ) configuration. Our experience with RDS MySQL was a positive one. It offers a number of useful management features to help offload routine database administration tasks. Having tasted the benefits of a managed database service, we checked if RDS offered an option where we could easily add replicas beyond two AZs, and where the replicas serve a read workload in addition to acting as failover target.

Aurora has been the solution. It promised MySQL compatibility, better performance, and access to the read servers. We have been able to take advantage of its scalability, high availability, fast failover, and high throughput with separate read servers to increase performance by 25 percent. Adding additional read capacity is fast and easy, and we were pleasantly surprised that we no longer have to allocate storage on the backend! The best part: It’s more cost-effective than our previous setup. Since moving to AWS, we’ve cut the cost of hosting a customer in half.

In addition to Aurora, RDS, and EC2, we use a number of other AWS services to fulfil our infrastructure needs. These include AWS Lambda, Amazon S3, Amazon ElastiCache, Amazon VPC, Amazon CloudFront, Amazon Route 53, AWS CodePipeline, Amazon CloudWatch, AWS CloudFormation, AWS Identity and Access Management (IAM), and Amazon SNS.

As we grow our application portfolio and serve more educators, we are excited about our partnership with AWS and Amazon Aurora. Together, we are doing great things.

ETL Job Orchestration with Matillion, Amazon DynamoDB, and AWS Lambda

by Wendy Neu | on | in DynamoDB, Lambda | | Comments

Wendy Neu is a big data architect at Amazon Web Services.

Traditional ETL tools often run on a schedule. If you are due to receive a file from a vendor between 1 A.M. and 4 A.M. on the third Wednesday of the month, you likely have a job scheduled look for the file at 4 A.M. But what if the file arrives late? Or what if the file arrives a day early and is accidentally swept into an archive before the scheduler has a chance to run? The delays or miscommunications could have an adverse impact on critical business metrics.

If you are using Matillion ETL for Redshift for your ETL/ELT processing, there is another way to manage job executions with native AWS tools like Amazon Simple Queue Service, Amazon DynamoDB and AWS Lambda.

In this post, I will show you how to build an orchestration engine that will not only execute your job as your file arrives in Amazon S3, but will extend to manage thousands of jobs, as needed.

(more…)