AWS Database Blog
Configuring the AWS Schema Conversion Tool
Wendy Neu is a big data architect at Amazon Web Services.
The AWS Schema Conversion Tool (AWS SCT) is a project-style Java application that can be installed on a local computer or an Amazon EC2 instance. The tool can be downloaded from AWS and is supported on MS Windows, Mac OSX, and Ubuntu and Fedora Linux.
You can use the AWS Schema Conversion Tool to convert your existing database schema from one database engine to another. You can convert relational OLTP schema, or data warehouse schema. Your converted schema is suitable for an Amazon RDS MySQL DB instance, an Amazon Aurora DB cluster, an Amazon RDS PostgreSQL database instance, or an Amazon Redshift cluster.
SCT is simple to use and takes only minutes to get started after you install the tool locally on your machine. It requires only connectivity to the source and target databases. SCT uses JDBC drivers to connect to the source and target databases. When you set up your first DB connection to a new source, the wizard interface will prompt you for a driver. All supported drivers are listed by database engine in the Installing and Updating section of the documentation.
When both the source and target database connections are configured, simply right-click on the schema you want to analyze and choose Create Report.

SCT will examine in detail all of the objects in that schema, including the schema itself, tables, constraints, types, sequences, synonyms, collection types, views, procedures, functions, and packages. It will convert as much as possible automatically and provide detailed information about items it couldnāt convert. The report can be saved as a .csv file or a .pdf file for review.
Navigate to the Action Items tab in the report to see the items that the tool couldnāt convert and understand how much work you have to do.

When youāve looked at the report, choose Actions, Convert schema from the menu bar to generate the data definition language (DDL) statements for the target database.

DDL for the target database is displayed in the project console, and you can edit it before applying it with SCT or simply saving it as an .sql file for application later.

In development, itās a good idea to apply the changes directly from within SCT to a sandbox version of your database to understand what objects will be automatically converted, how many objects are involved, and what level of development effort is ahead. Beyond the sandbox and early development phases, itās best practice to adhere to normal software lifecycle procedures within your organization. Scripted conversions will facilitate key processes like code reviews, automated builds, and automated deployments.

Working with Global Settings
The AWS Schema Conversion Tool is very easy to use and lets a DBA or developer move very quickly to analysis and deployment. However, sometimes the default settings might need to be modified. For example, such a change might be necessary if the wrong driver was entered through the wizard for a database engine, a single schema includes thousands of objects and places the Java virtual machine (JVM) under memory pressure, or verbose logging must be enabled. AWS Schema Conversion Tool exposes fine control over all these key areas through the Global settings dialog box. The sections following offer some suggestions on how to use global settings for fine control over these areas.
All sections following are accessed through the SCT console by choosing Settings and then Global Settings from the top menu bar.

Driver settings
You must have the required JDBC drivers installed for the AWS Schema Conversion Tool to connect to your source or target database. When a driver is downloaded, you can upload the driver into the tool through the wizard or through the Global settings dialog box. The advantage of uploading the drivers through the Global settings dialog box is that you can enter or update all of the drivers at one time.
Choose Drivers from the side navigation panel in the Global settings dialog box, and set the target database driver by choosing Browse, navigating to the local machine, and choosing the .jar file to associate with SCT for a particular database engine.

All projects within SCT for that database engine will use the selected driver.
Performance and memory settings
SCT has menu options that control how aggressively the tool will consume JVM memory:
- Fast conversion, but large memory consumption ā This option optimizes for a fast conversion but might require more memory for the object reference cache.
- Slower conversion, lower memory consumption ā This option minimizes the amount of memory used but results in a slower conversion. Use this option if your computer has a limited amount of memory.
- Balance speed with memory consumption ā This option optimizes provides a balance between memory use and conversion speed.
To set these options, choose Performance and Memory from the side navigation panel in the Global settings dialog box, and change Performance and Memory Management Policy to the desired setting.

JVM size settings
The default amount of memory allocated to SCT is just less than a gigabyte. This memory size is usually sufficient for small databases of less than 1000 objects. For many databases, it is necessary to reconfigure the SCT JVM to use more memory even when the JVM is configured in the low memory consumption configuration.
The screenshot that follows shows an SCT JVM under duress, with the tasks allocated hitting the maximum boundary of JVM memory. The setting was Slower conversion, lower memory consumption and failed due to a too-small JVM memory setting.

You can increase (or decrease) the maximum and minimum memory allocated to the SCT JVM through configuration files.
Configuration files for SCT are located in the AWS Schema Conversion Tool installation directory under a subdirectory called app. The name of the main configuration file is AWS Schema Conversion Tool.cfg. You can modify the file using any text editorāhowever, donāt use a word processing tool (like Word or WordPad) because it might change certain characters in the file and prevent the file from being read correctly by the JVM.
The image following shows an AWS Schema Conversion Tool installation directory under C:\Program Files as an example.

The default file that is installed with SCT has a few parameters for the tool itself and a single JVM option. None of the existing parameters in the file should be changed without consulting AWS Support.

You can modify the memory settings of the JVM by adding two parameters under the [JVMUserOptions] section. However, use caution when configuring the JVM size. If your maximum size exceeds your systemās available memory, it might crash your machine.
The first of the two JVM memory parameters is the maximum JVM memory size, Xmx, which is recorded in megabytes and should be in standard recognized memory values. To set a maximum JVM size of 4 GB, set this value to āXmx4096m with no spaces on a single line.
The second parameter is the minimum JVM memory size Xms, which is recorded in megabytes and should be in standard recognized memory values. To set a minimum JVM size of 2 GB, set this value to āXms2048m with no spaces on a single line.

When youāre done editing, save the file and restart AWS SCT. You should see the values in the status bar change to reflect your new settings.
The screenshot that follows shows an SCT JVM running smoothly because the tasks allocated are operating comfortably within the maximum boundary of the JVM. The setting was Slower conversion, lower memory consumption, and the tool succeeded in converting nearly 5000 objects in about 10 minutes.

Logging
In addition to managing the memory settings of the AWS Schema Conversion Tool, it can be a good idea to increase the logging information produced by the tool when converting new projects. Although increasing logging information might slow conversion slightly, this change will give robust information to AWS Support if any unforeseen errors arise.
To change the logging settings, choose Settings and then Global Settings. In the left navigation pane, choose Logging. See the section above Memory Management for help.
In the logging pane, change Debug mode to true and choose a few key items to increase the logging information for. Setting the Parser, Type Mapping, and User Interface to TRACE can help with key problem areas during conversion.

You can also configure the log location. If the information becomes too verbose for the current file system where the logs are streaming, change the value to a location with sufficient space to capture the logs.

To transmit logs to AWS Support, navigate to the directory where the logs are stored, zip up all the files into a manageable single file, and upload the .zip file with your support case.
When the initial analysis is completed and ongoing development is taking place, simply return Debug mode to false to eliminate the verbose logging and increase speed.
Tip: To manage the log size and streamline reporting issues, remove the logs or move them to another location after a successful conversion. Doing this ensures that only the relevant errors and information are transmitted to AWS Support and keeps the log file system from filling.
All references in this document refer to Version 1 of the AWS SCT and were tested on build 401 successfully.
Using Amazon RDS for Oracle as the Oracle SOA Suite Database
This post is written by Fabio Douek, an architect at AWS Partner Rubicon Red.
When we started to provision Oracle Fusion Middleware platforms with AWS, we took advantage of the ability to create AWS infrastructure within minutes. We can also use MyST to provision complex platforms that are EDG-compliant with Oracle Fusion Middleware in less than an hour.
One challenge that we faced was that we couldnāt use Amazon RDS for Oracle as the database for our Oracle Fusion Middleware installations. Our inability to do so was primarily because the RDS master user didnāt have the database privileges required to run the Oracle Repository Creation Utility (RCU). As a result, we implemented our own automation for provisioning the Oracle Database running on Amazon EC2 instances.
Although this approach works for running development and test workloads on AWS, when it comes to implementing production workloads, Amazon RDS for Oracle provides a number of additional benefits. These benefits include simplified administration tasks, including backups, software patching, monitoring, and hardware scaling.
In addition, Multi-AZ deployment with Amazon RDS for Oracle simplifies the implementation of a highly available architecture, because it contains built-in support for automated failover from your primary database to a synchronously replicated secondary database in an alternative Availability Zone in case of a failure.
Our approach to this challenge started to change late last year, with a number of our customers looking at running Oracle SOA workloads in production on AWS. Being an AWS Technology Partner, we provided this feedback to AWS, who in return invited us to collaborate with the RDS team.
We spent the last four months working with the Amazon RDS team to test the RCU capability within MyST. This process went extremely well, and the Amazon RDS team worked closely with us to support the go-live of our first customer on Oracle SOA 12.2.1 on AWS using Amazon RDS for Oracle āwhat we believe to be a world first!
The following diagram illustrates the type of architecture required.
![]()
More recently, Amazon has now announced that RCU is officially supported by RDS Oracle. Thatās great news, because it means that we can provision an Oracle Fusion Middleware environment that is EDG-compliant and highly available within minutes. This way, we can take advantage of RDS to simplify ongoing operations.
To find out more detail on the steps required to enable Oracle SOA with RDS Oracle, see our blog post on the Rubicon Red website.
Migrate PostgreSQL Databases and Perform Ongoing Replication with the AWS Database Migration Service
Jeff Levine is a solutions architect for Amazon Web Services.
As a solutions architect for Amazon Web Services, I help our customers to migrate workloads to the AWS Cloud, a key part of which includes their databases. The AWS Database Migration Service (AWS DMS) enables organizations to migrate data to and from a variety of databases located in Amazon Relational Database Service (Amazon RDS), those running on Amazon EC2, and also those running on-premises. The service supports homogeneous migrations such as Oracle to Oracle, and also heterogeneous migrations such as Oracle to Amazon Aurora. You can also customize table mappings and perform translations. In this post, I am going to talk about two things, first about using an RDS PostgreSQL source database for DMS and then about using the recently announced continuous data replication feature, which can keep the target database up to date after the initial migration.
Using an RDS PostgreSQL Source
You might think that migration means just moving data from one location to another, but the process involves far more. The process must also keep up with transactions that take place during the migration, so that the database is left in a consistent state upon completion. Accordingly, when we look at migrating from an Amazon RDS PostgreSQL source database, we need to take into account both how data is moved and also environmental factors that can affect the ongoing processing of transactions while the migration is happening.
Letās take the case of a PostgreSQL to MySQL migration. Iāll begin with a simple table definition.
CREATE TABLE subscribers (
subscriber_id SERIAL NOT NULL PRIMARY KEY, subscriber_info json NOT NULL
);
Iām going to use pgAdmin, a popular open source administration tool for PostgreSQL, to create the table and display the specification in the source database.

After I run the DMS migration task, hereās how the newly created subscribers table appears in the target database using the program DbVisualizer.

Notice how the serial column is now an integer and the subscriber_info column has been converted to longtext. To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (longtext). Itās very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.
We also need to look at source-related performance factors such as replication activity and the use of write-ahead logging (WAL) that can affect the migration. Using Amazon RDS, you can change database parameters using parameter groups. One adjustment we suggest is changing the wal_sender_timeout parameter to 0. To do this, create a parameter group for your RDS PostgreSQL database and adjust the parameter accordingly. Because this is a dynamic parameter, the adjustment will take effect immediately. You can read about all the nuances of using PostgreSQL DMS sources in the RDS documentation.
Using Continuous Data Replication
Additionally, AWS recently announced continuous data replication for DMS, which enables you to keep your database up to date after the initial migration. With this feature, you can now have high availability during the replication process by specifying Multi-AZ when you create your replication instances.

When Multi-AZ is selected, two replication hosts will be created, each of which has its own IP address. Remember to adjust the underlying source and target security groups (or their on-premises equivalents) to grant both instances the appropriate level of access.
The AWS Database Migration Service offers a powerful set of migration capabilities. It is important to consider how the service works behind the scenes as part of the development of an overall migration plan. With continuous data replication, you can now keep your databases up to date after the initial load. Itās easy to get started. Follow our Getting Started Guide to migrate your database using AWS Database Migration Service with just a few clicks.
Creating a Simple Autocompletion Service with Redis: Part One of Two
Darin Briskman (@briskmad) is a developer evangelist for Amazon Web Services.
So youāve figured out how to use a cool framework to make a great-looking web form. Youāve made it super-easy for people to understand each field and how to fill it out. But there is still a lot of stuff to type. How can we make it even easier, especially for mobile devices where a lot of data entry is a challenge?
One easy thing to do is add an autocomplete function, which detects when characters are entered into a field and suggests which words or phrases are desired. All you need is a list of phrases, Amazon ElastiCache for Redis, and a little bit oā code.
If youāre not a Redis expert, donāt worry: For the purposes of autocompletion, everything youāll need to know about Redis is in this blog post. Because youāre using Amazon ElastiCache, all the rest of the work (installation, updates, data management, maintenance, and so forth) is automated and done for youāall you need to do is provide the data and fetch the data.
Using Redis Sorted Sets
What makes this fast and easy is the Redis Sorted Sets data type. This type is simply a nonrepeating collection of strings, with a score associated with each one. The strings are always stored in descending order of scores. When more than one string has the same score, the set is ordered lexicographically. Because the set is always kept in order, retrieving data from the beginning or the end or the middle is easy, using the Redis ZRANGE command.
If we put the names of a few countries into a Redis sorted set, weāll get the following.
redis> zadd mylist 0 "United States"
(integer) 1
redis> zadd mylist 0 "Sweden"
(integer) 1
redis> zadd mylist 0 "Union of South Africa"
(integer) 1
redis> zadd mylist 0 "United Kingdom"
(integer) 1
redis> zrange mylist 0 -1
1. "Sweden"
2. "Union of South Africa"
3. "United Kingdom"
4. "United States"
The other Redis command we need is ZRANK, which can tell us where a string lies inside an ordered set.
redis> zrank mylist "United Kingdom"
(integer) 2
(Indexes in Redis always start at zero, so 2 is the third item on the list.)
And now that we know that United Kingdom is item 2 in the sorted set, we can easily find what comes before and after it.
redis> zrange mylist 1 2
1) "Union of South Africa"
2) "United Kingdom"
redis> zrange mylist 2 3
1) "United Kingdom"
2) "United States"
For autocompletion, we need a sorted set that includes the letters that someone would type. We also need a character to show that weāve reached the end of the phrase. Iām going to use %.
redis> zadd mylist2 0 "United States%"
(integer) 1
redis> zadd mylist2 0 "United State"
(integer) 1
redis> zadd mylist2 0 "United Stat"
(integer) 1
redis> zadd mylist2 0 "United Sta"
(integer) 1
redis> zadd mylist2 0 "United St"
(integer) 1
redis> zadd mylist2 0 "United S"
(integer) 1
redis> zadd mylist2 0 "United "
(integer) 1
redis> zadd mylist2 0 "United"
(integer) 1
redis> zadd mylist2 0 "Unite"
(integer) 1
redis> zadd mylist2 0 "Unit"
(integer) 1
redis> zadd mylist2 0 "Uni"
(integer) 1
redis> zadd mylist2 0 "Un"
(integer) 1
redis> zadd mylist2 0 "U"
(integer) 1
redis> zrange mylist2 0 -1
1. "U"
2. "Un"
3. "Uni"
4. "Unit"
5. "Unite"
6. "United
7. "United "
8. "United S"
9. "United St"
10. "United Sta"
11. "United Stat"
12. "United State"
13. "United States%"
Now we can use code to loop through the entries to find the complete entry (ending with %).
Putting the pieces together
Letās put it all together. First, weāll set up an ElastiCache Redis node. If there are no repeating prefixes, this algorithm will create (Number of Items) x (Average Length of Each Item + 1) x (Average Length of each item) ÷ 2 entries in our sorted set. Unless you have a very weird set of items, there will be repetition among the prefixes, so the actual amount of memory needed will be smaller. For this example, countries.txt has 195 items that are an average of 10.4 characters each, including spaces (and the fileās encoded in UTF-8, so itās one byte per character). So weāll need no more than 11,560 bytes to store all the possible prefixes for this list. For more information on how to size your Redis node, see Selecting Your Redis Node Size in the ElastiCache User Guide.
This isnāt a lot of data, so we can use the very smallest cache.t2.micro node, which can run using the AWS Free Tier.
Start by going to the Amazon ElastiCache for Redis page and choosing Get Started or by signing on to your AWS Management Console and choosing ElastiCache.

Then create a Redis cluster. For simplicity, weāll avoid replication (though for a production instance, replication is a good idea!). Uncheck Enable Replication, name the cluster redis-autocomplete, and choose cache.t2.micro for node type. You can leave all the other settings at their defaults, then launch the cluster.


It can be handy to have redis-cli (the Redis command line interface) where you can test and debug your redis commands. If youāre anything like me, writing code is easy⦠debugging it to make it actually work, not so easy (sigh).
For the CLI, create a new Amazon EC2 instance (t2.micro will be enough) and log in to it. If you havenāt done this before, see Getting Started with Amazon EC2 Linux Instances for a guide on creating your EC2 instance and the needed security groups.
In theory, all you need to install is redis-cli, but there are a lot of dependencies. Redis is pretty small, so itās easiest to just install the whole package.
sudo yum --enablerepo=epel install redis
To actually use the CLI, youāll need to know the name of your ElastiCache instance. On your AWS Management Console, choose ElastiCache from the Services menu, then choose Cache Clusters.

Then choose your node.

Look for the DNS name of your cache instance under Endpoint.

Now from your EC2 instance, you can use redis-cli.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>
Try using the INFO command at the redis> prompt to validate that all is working as it should.
When youāre done, use CTRL-D to end the interactive redis> prompt and return to the system shell.
In a future post, weāll use AWS Lambda to make this completely serverless. For now, weāll keep it simple, using Python 2.7 (the version included with most Linux distributions) on our EC2 node.
Testing our simple autocompletion service
Letās do a test run. First, we need some code that reads our text file and parses it into prefixes. We will need to grab a file with a list of items for autocompletionāyou can use your own or you can use the list of UN member countries Iāve posted online.
$ wget http://autocomplete.s3.amazonaws.com/Countries.txt
$ cp Countries.txt autocomplete.txt
We also need to install the Redis SDK for Python.
$ sudo pip install redis
Now that we have all of the prerequisites, we can use some code to load and parse the file. Use your favorite text editor (Iām old school, so I use vi) to create a file called ac_setup.py.
import redis
r = redis.StrictRedis(host='<your ElastiCache endpoint DNS>', port=6379, db=0)
f = open(āautocomplete.txt',"r")
for line in f:
n = line.strip()
for l in range(1,len(n)):
prefix = n[0:l]
r.zadd('autocomplete',0,prefix)
r.zadd('autocomplete',0,n+"%")
else:
exit
This code should pull in countries.txt and parse it into an ordered set of every unique prefix, with the full names identified by ending with %. Letās execute the code.
$ python ac_setup.py
To check that this code worked, we can go back into the redis> interface and see what we have.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zrange autocomplete 0 -1
1) "A"
2) "Af"
3) "Afg"
4) "Afgh"
5) "Afgha"
6) "Afghan"
7) "Afghani"
8) "Afghanis"
9) "Afghanist"
10) "Afghanista"
11) "Afghanistan%"
12) "Al"
13) "Alb"
14) "Alba"
15) "Alban"
...
Now, we need the code to pull the potential autocompletion list from Redis. Again, use your favorite text editor (Iām still using vi!) to create a file named ac.py.
import sys
import redis
r = redis.StrictRedis(host=<your ElastiCache endpoint DNS>..cache.amazonaws.com', port=6379, db=0)
def complete(r,prefix,count):
results = []
grab = 42
start = r.zrank('autocomplete',prefix)
if not start:
return []
while (len(results) != count):
range = r.zrange('autocomplete',start,start+grab-1)
start += grab
if not range or len(range) == 0:
break
for entry in range:
minlen = min(len(entry),len(prefix))
if entry[0:minlen] != prefix[0:minlen]:
count = len(results)
break
if entry[-1] == "%" and len(results) != count:
results.append(entry[0:-1])
return results
def autoComplete():
print complete(r,sys.argv[1],50)
if __name__ == "__main__":
autoComplete()
This program needs one argument, which is the prefix we want to search. Try it out!
$ python ac.py U
['Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 'Uzbekistan']
$ python ac.py Uni
['United Arab Emirates', 'United Kingdom', 'United States']
Congratulations! Weāve now built a working recommendation engine. If you want to try this with a larger dataset, letās use the list of all of the cities and settlements in the US (provided by the US Census Bureau).
Flush the data out of Redis.
$ redis-cli -h <your ElastiCache endpoint DNS>
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379>FLUSHDB
OK
Then double-check to make sure itās gone.
<Your ElastiCache endpoint DNS>.cache.amazonaws.com:6379> zcard autocomplete
(integer) 0
Use CTRL-D to exit the Redis shell.
re<your ElastiCache Endpoint DNS>.cache.amazonaws.com:6379>^D
$
Then load the cities data.
$ wget http://autocomplete.s3.amazonaws.com/Cities.txt
$ cp Cities.txt autocomplete.txt
$ python ac_setup.py
This test will take a bit longer to run, because there are 15,637 items on the list to be parsed, which will take a few minutes on our micro family instance. Once done, we can now search for cities.
$ python ac.py Chi
['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts', 'Chiefland, Florida', 'Childersburg, Alabama', 'Childress, Texas', 'Chilhowee, Missouri', 'Chillicothe, Illinois', 'Chillicothe, Missouri', 'Chillicothe, Ohio', 'Chilmark, Massachusetts', 'Chiloquin, Oregon', 'Chilton, Texas', 'Chilton, Wisconsin', 'Chimacum, Washington', 'China Spring, Texas', 'Chincoteague Island, Virginia', 'Chinese Camp, California', 'Chinle, Arizona', 'Chino Hills, California', 'Chino Valley, Arizona', 'Chino, California', 'Chinook, Montana', 'Chipley, Florida', 'Chippewa Falls, Wisconsin', 'Chisago City, Minnesota', 'Chisholm, Minnesota', 'Chittenango, New York', 'Chittenden, Vermont']
$ python ac.py Chic
['Chicago Heights, Illinois', 'Chicago Park, California', 'Chicago Ridge, Illinois', 'Chicago, Illinois', 'Chichester, New Hampshire', 'Chickamauga, Georgia', 'Chickasha, Oklahoma', 'Chico, California', 'Chicopee, Massachusetts']
In part two of this post, weāll explore using AWS Lambda and Amazon S3 to automate loading the data and running the code, plus look at ways to integrate this autocompletion service into a webpage.
I hope you found this fun and useful! If youād like more on this Redis pattern, this blog post was inspired by Salvatore āantirezā Sanfilippo, Redisā creator, who wrote about Redis autocompletion on his blog. Amazon ElastiCache for Redis is a quickly evolving service, and you can always get the latest updates at the Amazon ElastiCache for Redis marketing website.
Welcome to the AWS Database Blog!
The AWS Database Blog will feature information for customers interested in running databases on AWS. Youāll see content from many AWS team members covering a range of topics, including the following:
· Announcements about the new launches
· Deep-dive and how-to guides
· Best practices
· Customer and partner stories
· And more!
To see future updates, check back often or subscribe to our blog using the RSS feed button at the top of the page.
If you have requests to cover specific topics, let us know in the comments.
Silvia Doomra
Senior Product Manager, AWS

