AWS Database Blog
Migrate PostgreSQL Databases and Perform Ongoing Replication with the AWS Database Migration Service
Jeff Levine is a solutions architect for Amazon Web Services.
As a solutions architect for Amazon Web Services, I help our customers to migrate workloads to the AWS Cloud, a key part of which includes their databases. The AWS Database Migration Service (AWS DMS) enables organizations to migrate data to and from a variety of databases located in Amazon Relational Database Service (Amazon RDS), those running on Amazon EC2, and also those running on-premises. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations such as Oracle to Amazon Aurora. You can also customize table mappings and perform translations. In this post, I am going to talk about two things, first about using an RDS PostgreSQL source database for DMS and then about using the recently announced continuous data replication feature, which can keep the target database up to date after the initial migration.
Using an RDS PostgreSQL Source
You might think that migration means just moving data from one location to another, but the process involves far more. The process must also keep up with transactions that take place during the migration, so that the database is left in a consistent state upon completion. Accordingly, when we look at migrating from an Amazon RDS PostgreSQL source database, we need to take into account both how data is moved and also environmental factors that can affect the ongoing processing of transactions while the migration is happening.
Letās take the case of a PostgreSQL to MySQL migration. Iāll begin with a simple table definition.
CREATE TABLE subscribers (
subscriber_id SERIAL NOT NULL PRIMARY KEY, subscriber_info json NOT NULL
);
Iām going to use pgAdmin, a popular open source administration tool for PostgreSQL, to create the table and display the specification in the source database.

After I run the DMS migration task, hereās how the newly created subscribers table appears in the target database using the program DbVisualizer.

Notice how the serial column is now an integer and the subscriber_info column has been converted to longtext. To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (longtext). Itās very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.
We also need to look at source-related performance factors such as replication activity and the use of write-ahead logging (WAL) that can affect the migration. Using Amazon RDS, you can change database parameters using parameter groups. One adjustment we suggest is changing the wal_sender_timeout parameter to 0. To do this, create a parameter group for your RDS PostgreSQL database and adjust the parameter accordingly. Because this is a dynamic parameter, the adjustment will take effect immediately. You can read about all the nuances of using PostgreSQL DMS sources in the RDS documentation.
Using Continuous Data Replication
Additionally, AWS recently announced continuous data replication for DMS, which enables you to keep your database up to date after the initial migration. With this feature, you can now have high availability during the replication process by specifying Multi-AZ when you create your replication instances.

When Multi-AZ is selected, two replication hosts will be created, each of which has its own IP address. Remember to adjust the underlying source and target security groups (or their on-premises equivalents) to grant both instances the appropriate level of access.
The AWS Database Migration Service offers a powerful set of migration capabilities. It is important to consider how the service works behind the scenes as part of the development of an overall migration plan. With continuous data replication, you can now keep your databases up to date after the initial load. Itās easy to get started. Follow our Getting Started Guide to migrate your database using AWS Database Migration Service with just a few clicks.
Creating a Simple Autocompletion Service with Redis: Part One of Two
Darin Briskman (@briskmad) is a developer evangelist for Amazon Web Services.
So youāve figured out how to use a cool framework to make a great-looking web form. Youāve made it super-easy for people to understand each field and how to fill it out. But there is still a lot of stuff to type. How can we make it even easier, especially for mobile devices where a lot of data entry is a challenge?
One easy thing to do is add an autocomplete function, which detects when characters are entered into a field and suggests which words or phrases are desired. All you need is a list of phrases, Amazon ElastiCache for Redis, and a little bit oā code.
If youāre not a Redis expert, donāt worry: For the purposes of autocompletion, everything youāll need to know about Redis is in this blog post. Because youāre using Amazon ElastiCache, all the rest of the work (installation, updates, data management, maintenance, and so forth) is automated and done for youāall you need to do is provide the data and fetch the data.
Using Redis Sorted Sets
What makes this fast and easy is the Redis Sorted Sets data type. This type is simply a nonrepeating collection of strings, with a score associated with each one. The strings are always stored in descending order of scores. When more than one string has the same score, the set is ordered lexicographically. Because the set is always kept in order, retrieving data from the beginning or the end or the middle is easy, using the Redis ZRANGE command.
If we put the names of a few countries into a Redis sorted set, weāll get the following.
redis> zadd mylist 0 "United States"
(integer) 1
redis> zadd mylist 0 "Sweden"
(integer) 1
redis> zadd mylist 0 "Union of South Africa"
(integer) 1
redis> zadd mylist 0 "United Kingdom"
(integer) 1
redis> zrange mylist 0 -1
1. "Sweden"
2. "Union of South Africa"
3. "United Kingdom"
4. "United States"
The other Redis command we need is ZRANK, which can tell us where a string lies inside an ordered set.
redis> zrank mylist "United Kingdom"
(integer) 2
(Indexes in Redis always start at zero, so 2 is the third item on the list.)
And now that we know that United Kingdom is item 2 in the sorted set, we can easily find what comes before and after it.
redis> zrange mylist 1 2
1) "Union of South Africa"
2) "United Kingdom"
redis> zrange mylist 2 3
1) "United Kingdom"
2) "United States"
For autocompletion, we need a sorted set that includes the letters that someone would type. We also need a character to show that weāve reached the end of the phrase. Iām going to use %.
redis> zadd mylist2 0 "United States%"
(integer) 1
redis> zadd mylist2 0 "United State"
(integer) 1
redis> zadd mylist2 0 "United Stat"
(integer) 1
redis> zadd mylist2 0 "United Sta"
(integer) 1
redis> zadd mylist2 0 "United St"
(integer) 1
redis> zadd mylist2 0 "United S"
(integer) 1
redis> zadd mylist2 0 "United "
(integer) 1
redis> zadd mylist2 0 "United"
(integer) 1
redis> zadd mylist2 0 "Unite"
(integer) 1
redis> zadd mylist2 0 "Unit"
(integer) 1
redis> zadd mylist2 0 "Uni"
(integer) 1
redis> zadd mylist2 0 "Un"
(integer) 1
redis> zadd mylist2 0 "U"
(integer) 1
redis> zrange mylist2 0 -1
1. "U"
2. "Un"
3. "Uni"
4. "Unit"
5. "Unite"
6. "United
7. "United "
8. "United S"
9. "United St"
10. "United Sta"
11. "United Stat"
12. "United State"
13. "United States%"
Now we can use code to loop through the entries to find the complete entry (ending with %).
Putting the pieces together
Letās put it all together. First, weāll set up an ElastiCache Redis node. If there are no repeating prefixes, this algorithm will create (Number of Items) x (Average Length of Each Item + 1) x (Average Length of each item) ÷ 2 entries in our sorted set. Unless you have a very weird set of items, there will be repetition among the prefixes, so the actual amount of memory needed will be smaller. For this example, countries.txt has 195 items that are an average of 10.4 characters each, including spaces (and the fileās encoded in UTF-8, so itās one byte per character). So weāll need no more than 11,560 bytes to store all the possible prefixes for this list. For more information on how to size your Redis node, see Selecting Your Redis Node Size in the ElastiCache User Guide.
This isnāt a lot of data, so we can use the very smallest cache.t2.micro node, which can run using the AWS Free Tier.
Start by going to the Amazon ElastiCache for Redis page and choosing Get Started or by signing on to your AWS Management Console and choosing ElastiCache.

Then create a Redis cluster. For simplicity, weāll avoid replication (though for a production instance, replication is a good idea!). Uncheck Enable Replication, name the cluster redis-autocomplete, and choose cache.t2.micro for node type. You can leave all the other settings at their defaults, then launch the cluster.


It can be handy to have redis-cli (the Redis command line interface) where you can test and debug your redis commands. If youāre anything like me, writing code is easy⦠debugging it to make it actually work, not so easy (sigh).
For the CLI, create a new Amazon EC2 instance (t2.micro will be enough) and log in to it. If you havenāt done this before, see Getting Started with Amazon EC2 Linux Instances for a guide on creating your EC2 instance and the needed security groups.
In theory, all you need to install is redis-cli, but there are a lot of dependencies. Redis is pretty small, so itās easiest to just install the whole package.
sudo yum --enablerepo=epel install redis
To actually use the CLI, youāll need to know the name of your ElastiCache instance. On your AWS Management Console, choose ElastiCache from the Services menu, then choose Cache Clusters.

Then choose your node.

Look for the DNS name of your cache instance under Endpoint.

Now from your EC2 instance, you can use redis-cli.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>
Try using the INFO command at the redis> prompt to validate that all is working as it should.
When youāre done, use CTRL-D to end the interactive redis> prompt and return to the system shell.
In a future post, weāll use AWS Lambda to make this completely serverless. For now, weāll keep it simple, using Python 2.7 (the version included with most Linux distributions) on our EC2 node.
Testing our simple autocompletion service
Letās do a test run. First, we need some code that reads our text file and parses it into prefixes. We will need to grab a file with a list of items for autocompletionāyou can use your own or you can use the list of UN member countries Iāve posted online.
$ wget http://autocomplete.s3.amazonaws.com/Countries.txt
$ cp Countries.txt autocomplete.txt
We also need to install the Redis SDK for Python.
$ sudo pip install redis
Now that we have all of the prerequisites, we can use some code to load and parse the file. Use your favorite text editor (Iām old school, so I use vi) to create a file called ac_setup.py.
import redis
r = redis.StrictRedis(host='<your ElastiCache endpoint DNS>', port=6379, db=0)
f = open(āautocomplete.txt',"r")
for line in f:
n = line.strip()
for l in range(1,len(n)):
prefix = n[0:l]
r.zadd('autocomplete',0,prefix)
r.zadd('autocomplete',0,n+"%")
else:
exit
This code should pull in countries.txt and parse it into an ordered set of every unique prefix, with the full names identified by ending with %. Letās execute the code.
$ python ac_setup.py
To check that this code worked, we can go back into the redis> interface and see what we have.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zrange autocomplete 0 -1
1) "A"
2) "Af"
3) "Afg"
4) "Afgh"
5) "Afgha"
6) "Afghan"
7) "Afghani"
8) "Afghanis"
9) "Afghanist"
10) "Afghanista"
11) "Afghanistan%"
12) "Al"
13) "Alb"
14) "Alba"
15) "Alban"
...
Now, we need the code to pull the potential autocompletion list from Redis. Again, use your favorite text editor (Iām still using vi!) to create a file named ac.py.
import sys
import redis
r = redis.StrictRedis(host=<your ElastiCache endpoint DNS>..cache.amazonaws.com', port=6379, db=0)
def complete(r,prefix,count):
results = []
grab = 42
start = r.zrank('autocomplete',prefix)
if not start:
return []
while (len(results) != count):
range = r.zrange('autocomplete',start,start+grab-1)
start += grab
if not range or len(range) == 0:
break
for entry in range:
minlen = min(len(entry),len(prefix))
if entry[0:minlen] != prefix[0:minlen]:
count = len(results)
break
if entry[-1] == "%" and len(results) != count:
results.append(entry[0:-1])
return results
def autoComplete():
print complete(r,sys.argv[1],50)
if __name__ == "__main__":
autoComplete()
This program needs one argument, which is the prefix we want to search. Try it out!
$ python ac.py U
['Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan']
$ python ac.py Uni
['United Arab Emirates', 'United Kingdom', 'United States']
Congratulations! Weāve now built a working recommendation engine. If you want to try this with a larger dataset, letās use the list of all of the cities and settlements in the US (provided by the US Census Bureau).
Flush the data out of Redis.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>FLUSHDB
OK
Then double-check to make sure itās gone.
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zcard autocomplete
(integer) 0
Use CTRL-D to exit the Redis shell.
re<your ElastiCache Endpoint DNS>.cache.amazonaws.com:6379>^D
$
Then load the cities data.
$ wget http://autocomplete.s3.amazonaws.com/Cities.txt
$ cp Cities.txt autocomplete.txt
$ python ac_setup.py
This test will take a bit longer to run, because there are 15,637 items on the list to be parsed, which will take a few minutes on our micro family instance. Once done, we can now search for cities.
$ python ac.py Chi
['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts', 'Chiefland, Florida', 'Childersburg, Alabama', 'Childress, Texas', 'Chilhowee, Missouri', 'Chillicothe, Illinois', 'Chillicothe, Missouri', 'Chillicothe, Ohio', 'Chilmark, Massachusetts', 'Chiloquin, Oregon', 'Chilton, Texas', 'Chilton, Wisconsin', 'Chimacum, Washington', 'China Spring, Texas', 'Chincoteague Island, Virginia', 'Chinese Camp, California', 'Chinle, Arizona', 'Chino Hills, California', 'Chino Valley, Arizona', 'Chino, California', 'Chinook, Montana', 'Chipley, Florida', 'Chippewa Falls, Wisconsin', 'Chisago City, Minnesota', 'Chisholm, Minnesota', 'Chittenango, New York', 'Chittenden, Vermont']
$ python ac.py Chic
['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts']
In part two of this post, weāll explore using AWS Lambda and Amazon S3 to automate loading the data and running the code, plus look at ways to integrate this autocompletion service into a webpage.
I hope you found this fun and useful! If youād like more on this Redis pattern, this blog post was inspired by Salvatore āantirezā Sanfilippo, Redisā creator, who wrote about Redis autocompletion on his blog. Amazon ElastiCache for Redis is a quickly evolving service, and you can always get the latest updates at the Amazon ElastiCache for Redis marketing website.
Welcome to the AWS Database Blog!
The AWS Database Blog will feature information for customers interested in running databases on AWS. Youāll see content from many AWS team members covering a range of topics, including the following:
· Announcements about the new launches
· Deep-dive and how-to guides
· Best practices
· Customer and partner stories
· And more!
To see future updates, check back often or subscribe to our blog using the RSS feed button at the top of the page.
If you have requests to cover specific topics, let us know in the comments.
Silvia Doomra
Senior Product Manager, AWS

