AWS Database Blog

Migrate PostgreSQL Databases and Perform Ongoing Replication with the AWS Database Migration Service

by Jeff Levine | on 14 SEP 2016 | in DMS, Migration, RDS PostgreSQL | Permalink | Comments

Jeff Levine is a solutions architect for Amazon Web Services.

As a solutions architect for Amazon Web Services, I help our customers to migrate workloads to the AWS Cloud, a key part of which includes their databases. The AWS Database Migration Service (AWS DMS) enables organizations to migrate data to and from a variety of databases located in Amazon Relational Database Service (Amazon RDS), those running on Amazon EC2, and also those running on-premises. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations such as Oracle to Amazon Aurora. You can also customize table mappings and perform translations. In this post, I am going to talk about two things, first about using an RDS PostgreSQL source database for DMS and then about using the recently announced continuous data replication feature, which can keep the target database up to date after the initial migration.

Using an RDS PostgreSQL Source
You might think that migration means just moving data from one location to another, but the process involves far more. The process must also keep up with transactions that take place during the migration, so that the database is left in a consistent state upon completion. Accordingly, when we look at migrating from an Amazon RDS PostgreSQL source database, we need to take into account both how data is moved and also environmental factors that can affect the ongoing processing of transactions while the migration is happening.

Let’s take the case of a PostgreSQL to MySQL migration. I’ll begin with a simple table definition.

CREATE TABLE subscribers (
subscriber_id SERIAL NOT NULL PRIMARY KEY, subscriber_info json NOT NULL
);

I’m going to use pgAdmin, a popular open source administration tool for PostgreSQL, to create the table and display the specification in the source database.

After I run the DMS migration task, here’s how the newly created subscribers table appears in the target database using the program DbVisualizer.

Notice how the serial column is now an integer and the subscriber_info column has been converted to longtext. To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (longtext). It’s very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.

We also need to look at source-related performance factors such as replication activity and the use of write-ahead logging (WAL) that can affect the migration. Using Amazon RDS, you can change database parameters using parameter groups. One adjustment we suggest is changing the wal_sender_timeout parameter to 0. To do this, create a parameter group for your RDS PostgreSQL database and adjust the parameter accordingly. Because this is a dynamic parameter, the adjustment will take effect immediately. You can read about all the nuances of using PostgreSQL DMS sources in the RDS documentation.

Using Continuous Data Replication
Additionally, AWS recently announced continuous data replication for DMS, which enables you to keep your database up to date after the initial migration. With this feature, you can now have high availability during the replication process by specifying Multi-AZ when you create your replication instances.

When Multi-AZ is selected, two replication hosts will be created, each of which has its own IP address. Remember to adjust the underlying source and target security groups (or their on-premises equivalents) to grant both instances the appropriate level of access.

The AWS Database Migration Service offers a powerful set of migration capabilities. It is important to consider how the service works behind the scenes as part of the development of an overall migration plan. With continuous data replication, you can now keep your databases up to date after the initial load. It’s easy to get started. Follow our Getting Started Guide to migrate your database using AWS Database Migration Service with just a few clicks.

Creating a Simple Autocompletion Service with Redis: Part One of Two

by Darin Briskman | on 12 SEP 2016 | in ElastiCache, Redis | Permalink | Comments

Darin Briskman (@briskmad) is a developer evangelist for Amazon Web Services.

So you’ve figured out how to use a cool framework to make a great-looking web form. You’ve made it super-easy for people to understand each field and how to fill it out. But there is still a lot of stuff to type. How can we make it even easier, especially for mobile devices where a lot of data entry is a challenge?

One easy thing to do is add an autocomplete function, which detects when characters are entered into a field and suggests which words or phrases are desired. All you need is a list of phrases, Amazon ElastiCache for Redis, and a little bit o’ code.

If you’re not a Redis expert, don’t worry: For the purposes of autocompletion, everything you’ll need to know about Redis is in this blog post. Because you’re using Amazon ElastiCache, all the rest of the work (installation, updates, data management, maintenance, and so forth) is automated and done for you—all you need to do is provide the data and fetch the data.

Using Redis Sorted Sets
What makes this fast and easy is the Redis Sorted Sets data type. This type is simply a nonrepeating collection of strings, with a score associated with each one. The strings are always stored in descending order of scores. When more than one string has the same score, the set is ordered lexicographically. Because the set is always kept in order, retrieving data from the beginning or the end or the middle is easy, using the Redis ZRANGE command.

If we put the names of a few countries into a Redis sorted set, we’ll get the following.
redis> zadd mylist 0 "United States" (integer) 1 redis> zadd mylist 0 "Sweden" (integer) 1 redis> zadd mylist 0 "Union of South Africa" (integer) 1 redis> zadd mylist 0 "United Kingdom" (integer) 1 redis> zrange mylist 0 -1 1. "Sweden" 2. "Union of South Africa" 3. "United Kingdom" 4. "United States"

The other Redis command we need is ZRANK, which can tell us where a string lies inside an ordered set.
redis> zrank mylist "United Kingdom" (integer) 2

(Indexes in Redis always start at zero, so 2 is the third item on the list.)

And now that we know that United Kingdom is item 2 in the sorted set, we can easily find what comes before and after it.
redis> zrange mylist 1 2 1) "Union of South Africa" 2) "United Kingdom" redis> zrange mylist 2 3 1) "United Kingdom" 2) "United States"

For autocompletion, we need a sorted set that includes the letters that someone would type. We also need a character to show that we’ve reached the end of the phrase. I’m going to use %.
redis> zadd mylist2 0 "United States%" (integer) 1 redis> zadd mylist2 0 "United State" (integer) 1 redis> zadd mylist2 0 "United Stat" (integer) 1 redis> zadd mylist2 0 "United Sta" (integer) 1 redis> zadd mylist2 0 "United St" (integer) 1 redis> zadd mylist2 0 "United S" (integer) 1 redis> zadd mylist2 0 "United " (integer) 1 redis> zadd mylist2 0 "United" (integer) 1 redis> zadd mylist2 0 "Unite" (integer) 1 redis> zadd mylist2 0 "Unit" (integer) 1 redis> zadd mylist2 0 "Uni" (integer) 1 redis> zadd mylist2 0 "Un" (integer) 1 redis> zadd mylist2 0 "U" (integer) 1 redis> zrange mylist2 0 -1 1. "U" 2. "Un" 3. "Uni" 4. "Unit" 5. "Unite" 6. "United 7. "United " 8. "United S" 9. "United St" 10. "United Sta" 11. "United Stat" 12. "United State" 13. "United States%"

Now we can use code to loop through the entries to find the complete entry (ending with %).

Putting the pieces together
Let’s put it all together. First, we’ll set up an ElastiCache Redis node. If there are no repeating prefixes, this algorithm will create (Number of Items) x (Average Length of Each Item + 1) x (Average Length of each item) ÷ 2 entries in our sorted set. Unless you have a very weird set of items, there will be repetition among the prefixes, so the actual amount of memory needed will be smaller. For this example, countries.txt has 195 items that are an average of 10.4 characters each, including spaces (and the file’s encoded in UTF-8, so it’s one byte per character). So we’ll need no more than 11,560 bytes to store all the possible prefixes for this list. For more information on how to size your Redis node, see Selecting Your Redis Node Size in the ElastiCache User Guide.

This isn’t a lot of data, so we can use the very smallest cache.t2.micro node, which can run using the AWS Free Tier.

Start by going to the Amazon ElastiCache for Redis page and choosing Get Started or by signing on to your AWS Management Console and choosing ElastiCache.

Console

Then create a Redis cluster. For simplicity, we’ll avoid replication (though for a production instance, replication is a good idea!). Uncheck Enable Replication, name the cluster redis-autocomplete, and choose cache.t2.micro for node type. You can leave all the other settings at their defaults, then launch the cluster.

Redis Param
It can be handy to have redis-cli (the Redis command line interface) where you can test and debug your redis commands. If you’re anything like me, writing code is easy… debugging it to make it actually work, not so easy (sigh).

For the CLI, create a new Amazon EC2 instance (t2.micro will be enough) and log in to it. If you haven’t done this before, see Getting Started with Amazon EC2 Linux Instances for a guide on creating your EC2 instance and the needed security groups.

In theory, all you need to install is redis-cli, but there are a lot of dependencies. Redis is pretty small, so it’s easiest to just install the whole package.
sudo yum --enablerepo=epel install redis
To actually use the CLI, you’ll need to know the name of your ElastiCache instance. On your AWS Management Console, choose ElastiCache from the Services menu, then choose Cache Clusters.

CacheCluster

Then choose your node.

node

Look for the DNS name of your cache instance under Endpoint.

endpoint

Now from your EC2 instance, you can use redis-cli.
$ redis-cli -h <your ElastiCache endpoint DNS> <Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>

Try using the INFO command at the redis> prompt to validate that all is working as it should.
When you’re done, use CTRL-D to end the interactive redis> prompt and return to the system shell.
In a future post, we’ll use AWS Lambda to make this completely serverless. For now, we’ll keep it simple, using Python 2.7 (the version included with most Linux distributions) on our EC2 node.

Testing our simple autocompletion service
Let’s do a test run. First, we need some code that reads our text file and parses it into prefixes. We will need to grab a file with a list of items for autocompletion—you can use your own or you can use the list of UN member countries I’ve posted online.
$ wget http://autocomplete.s3.amazonaws.com/Countries.txt $ cp Countries.txt autocomplete.txt

We also need to install the Redis SDK for Python.
$ sudo pip install redis

Now that we have all of the prerequisites, we can use some code to load and parse the file. Use your favorite text editor (I’m old school, so I use vi) to create a file called ac_setup.py.

import redis
r = redis.StrictRedis(host='<your ElastiCache endpoint DNS>', port=6379, db=0)
f = open(‘autocomplete.txt',"r")
  for line in f:
    n = line.strip()
    for l in range(1,len(n)):
      prefix = n[0:l]
      r.zadd('autocomplete',0,prefix)
    r.zadd('autocomplete',0,n+"%")
else:
  exit

This code should pull in countries.txt and parse it into an ordered set of every unique prefix, with the full names identified by ending with %. Let’s execute the code.
$ python ac_setup.py

To check that this code worked, we can go back into the redis> interface and see what we have.
$ redis-cli -h <your ElastiCache endpoint DNS> <Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zrange autocomplete 0 -1 1) "A" 2) "Af" 3) "Afg" 4) "Afgh" 5) "Afgha" 6) "Afghan" 7) "Afghani" 8) "Afghanis" 9) "Afghanist" 10) "Afghanista" 11) "Afghanistan%" 12) "Al" 13) "Alb" 14) "Alba" 15) "Alban" ...

Now, we need the code to pull the potential autocompletion list from Redis. Again, use your favorite text editor (I’m still using vi!) to create a file named ac.py.

import sys
import redis
r = redis.StrictRedis(host=<your ElastiCache endpoint DNS>..cache.amazonaws.com', port=6379, db=0)
def complete(r,prefix,count):
  results = []
  grab = 42
  start = r.zrank('autocomplete',prefix)
  if not start:
    return []
  while (len(results) != count):
    range = r.zrange('autocomplete',start,start+grab-1)
    start += grab
    if not range or len(range) == 0:
      break
    for entry in range:
      minlen = min(len(entry),len(prefix))
      if entry[0:minlen] != prefix[0:minlen]:
        count = len(results)
        break
      if entry[-1] == "%" and len(results) != count:
        results.append(entry[0:-1])
  return results

def autoComplete():
  print complete(r,sys.argv[1],50)

if __name__ == "__main__":
  autoComplete()

This program needs one argument, which is the prefix we want to search. Try it out!
$ python ac.py U ['Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan']

$ python ac.py Uni ['United Arab Emirates', 'United Kingdom', 'United States']

Congratulations! We’ve now built a working recommendation engine. If you want to try this with a larger dataset, let’s use the list of all of the cities and settlements in the US (provided by the US Census Bureau).

Flush the data out of Redis.
$ redis-cli -h <your ElastiCache endpoint DNS> <Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>FLUSHDB OK

Then double-check to make sure it’s gone.
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zcard autocomplete (integer) 0

Use CTRL-D to exit the Redis shell.
re<your ElastiCache Endpoint DNS>.cache.amazonaws.com:6379>^D $

Then load the cities data.
$ wget http://autocomplete.s3.amazonaws.com/Cities.txt $ cp Cities.txt autocomplete.txt $ python ac_setup.py

This test will take a bit longer to run, because there are 15,637 items on the list to be parsed, which will take a few minutes on our micro family instance. Once done, we can now search for cities.
$ python ac.py Chi ['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts', 'Chiefland, Florida', 'Childersburg, Alabama', 'Childress, Texas', 'Chilhowee, Missouri', 'Chillicothe, Illinois', 'Chillicothe, Missouri', 'Chillicothe, Ohio', 'Chilmark, Massachusetts', 'Chiloquin, Oregon', 'Chilton, Texas', 'Chilton, Wisconsin', 'Chimacum, Washington', 'China Spring, Texas', 'Chincoteague Island, Virginia', 'Chinese Camp, California', 'Chinle, Arizona', 'Chino Hills, California', 'Chino Valley, Arizona', 'Chino, California', 'Chinook, Montana', 'Chipley, Florida', 'Chippewa Falls, Wisconsin', 'Chisago City, Minnesota', 'Chisholm, Minnesota', 'Chittenango, New York', 'Chittenden, Vermont']

$ python ac.py Chic ['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts']

In part two of this post, we’ll explore using AWS Lambda and Amazon S3 to automate loading the data and running the code, plus look at ways to integrate this autocompletion service into a webpage.

I hope you found this fun and useful! If you’d like more on this Redis pattern, this blog post was inspired by Salvatore “antirez” Sanfilippo, Redis’ creator, who wrote about Redis autocompletion on his blog. Amazon ElastiCache for Redis is a quickly evolving service, and you can always get the latest updates at the Amazon ElastiCache for Redis marketing website.

Welcome to the AWS Database Blog!

by Silvia Doomra | on 12 SEP 2016 | Permalink | Comments

The AWS Database Blog will feature information for customers interested in running databases on AWS. You’ll see content from many AWS team members covering a range of topics, including the following:

·         Announcements about the new launches
·         Deep-dive and how-to guides
·         Best practices
·         Customer and partner stories
·         And more!

To see future updates, check back often or subscribe to our blog using the RSS feed button at the top of the page.
If you have requests to cover specific topics, let us know in the comments.

Silvia Doomra

Senior Product Manager, AWS

Aug	SEP	Oct
	16
2015	2016	2017