DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Optimizing Data Management: Migrating From AWS RDS MySQL to Snowflake
  • Understanding RDS Costs
  • 7 Invaluable Advantages of Using Amazon RDS
  • Cloud Database Services Compared: AWS, Microsoft, Google, and Oracle

Trending

  • Recurrent Workflows With Cloud Native Dapr Jobs
  • Create Your Own AI-Powered Virtual Tutor: An Easy Tutorial
  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  • The Modern Data Stack Is Overrated — Here’s What Works
  1. DZone
  2. Data Engineering
  3. Databases
  4. Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

This article discusses the challenges faced during relational database migration to AWS using DMS, including source data, logging, and network bandwidth issues.

By 
Vijay Bhosale user avatar
Vijay Bhosale
·
Feb. 07, 25 · Analysis
Likes (3)
Comment
Save
Tweet
Share
4.0K Views

Join the DZone community and get the full member experience.

Join For Free

AWS Database Migration Service is a cloud service that migrates relational databases, NoSQL databases, data warehouses, and all other types of data stores into AWS Cloud or between cloud and on-premises setups efficiently and securely. DMS supports several types of source and target databases such as Oracle, MS SQL Server, MySQL, Postgres SQL, Amazon Aurora, AWS RDS, Redshift, and S3, etc.

Observations During the Data Migration

We worked on designing and creating an AWS S3 data lake and data warehouse in AWS Redshift with the data sources from on-premises for Oracle, MS SQL Server, MySQL, Postgres SQL, and MongoDB for relational databases. We used AWS DMS for the initial full load and daily incremental data transfer from these sources into AWS S3. 

With this series of posts, I want to explain the various challenges faced during the actual data migration with different relational databases.

1. Modified Date Not Populated Properly at the Source

AWS DMS is used for full load and change data capture from source databases. AWS DMS captures changed records based on the transaction logs, but a modified date column updated properly can help to apply deduplication logic, and extract the latest modified record for a given row on the target in S3.

In case modified data is not available for a table or it is not updated properly, AWS DMS provides an option of transformation rules to add a new column while extracting data from the source database. Here, the AR_H_CHANGE_SEQ header helps to add a new column with value as a unique incrementing number from the source database, which consists of a timestamp and an auto-incrementing number.

The below code example adds a new column as DMS_CHANGE_SEQ to the target, which has a unique incrementing number from the source. This is a 35-digit unique number with the first 16 digits for the timestamp and the next 19 digits for the record ID number incremented by the database. 

JSON
 
{
    "rule-type": "transformation",
    "rule-id": "2",
    "rule-name": "2",
    "rule-target": "column",
    "object-locator": {
        "schema-name": "%",
        "table-name": "%"
    },
    "rule-action": "add-column",
    "value": "DMS_CHANGE_SEQ",
    "expression": "$AR_H_CHANGE_SEQ",
    "data-type": {
        "type": "string",
        "length": 100
    }
}


2. Enabling Supplemental Logging for Oracle as a Source

For Oracle as a source database, to capture ongoing changes, AWS DMS needs minimum supplemental logging to be enabled on the source database. Accordingly, this will include additional information and columns in the redo logs to identify the changes at the source.

Supplemental logging can be enabled for primary, unique keys, sets of columns, or all the columns. Supplemental logging for all columns captures all the columns for the tables in the source database and helps to overwrite the complete records in the target AWS S3 layer.

Supplemental logging of all columns will increase the redo logs size, as all the columns for the table are logged into the logs. One needs to configure, redo, and archive logs accordingly to consider additional information in them.

3. Network Bandwidth Between Source and Target Databases

Initial full load from the on-premises sources for Oracle, MS SQL Server, etc., worked fine and changed data capture, too, for most of the time. 

There used to be a moderate number of transactions most of the time of the day in a given month, except for the end-of-business-day process, daily, post-midnight, and month-end activities. We observed DMS migration tasks were out of sync or failed during this time.

We reviewed the source, target, and replication instance metrics in the logs and found the following observations:

  • CDCLatencySource – the gap, in seconds, between the last event captured from the source endpoint and the current system timestamp of the AWS DMS instance.
  • CDCIncomingchanges – the total number of change events at a point in time that is waiting to be applied to the target. This increases from zero to thousands during reconciliation activities in the early morning.
  • CDCLatencySource – the gap, in seconds, between the last event captured from the source endpoint and the current system timestamp of the AWS DMS instance. This increases from zero to a few thousand up to 10-12K seconds during daily post-midnight reconciliation activities. This value was up to 40K during month-end activities. 

Upon further logs analysis and reviewing other metrics, we observed that:

AWS DMS metrics NetworkReceiveThroughput is to understand the incoming traffic on the DMS Replication instance for both customer database and DMS traffic. These metrics help to understand the network-related issues, if any, between the source database and the DMS replication instance.

AWS DMS metrics NetworkReceiveThroughput

 

It was observed that the network receive throughput was up to 30MB/s, i.e., 250Mb/s, due to the VPN connection between the source and AWS, which was also shared for other applications.

The final conclusion to this issue is that connectivity between source and target databases is critical for successful data migration. You should ensure sufficient bandwidth between on-premises or other cloud source databases and the AWS environment is set up before the actual data migration.

  1. A VPN tunnel such as AWS Site-to-Site VPN or Oracle Cloud Infrastructure (OCI) Site-to-Site VPN (Oracle AWS) can provide a throughput of up to 1.25 Gbps. This would be sufficient for small tables migration or tables with less DML traffic migration.
  2. For large data migrations with heavy transactions per second on the tables, you should consider AWS Direct Connect. It provides an option to create a dedicated private connection with 1 Gbps, 10 Gbps, etc. bandwidth supported. 

Conclusion

This is Part I of the multi-part series for the relational databases migration challenges using AWS DMS and their solutions implemented. Most of these challenges mentioned in this series could happen during the database migration process and these solutions can be referred.

AWS Data lake Database Relational database

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Data Management: Migrating From AWS RDS MySQL to Snowflake
  • Understanding RDS Costs
  • 7 Invaluable Advantages of Using Amazon RDS
  • Cloud Database Services Compared: AWS, Microsoft, Google, and Oracle

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: