Data Synchronization - Choosing the Right Tool for the Job
Data synchronization can be challenging, but it doesn't have to be...
Keeping data synchronized across multiple Teradata systems is typically driven by one or more business initiatives; Disaster Recovery, High Availability/Failover, Populating Independent Data Marts for workload optimization or simply refreshing a Test/QA environment. Whichever your business need, a requirement exists to have the same data in two or more locations and often times in different data centers.
Choosing the right tool to move data within and throughout an analytical ecosystem is a critical decision and there are several options available. Synchronization methods include Table Copy operations and a new concept called SQL Multicast. The decision to select the correct method may seem like a daunting task. So how does one narrow it down to make sure they are choosing the right tool for the job? Look at the business requirements.
Business requirements will always lead to the answer. After looking at the business requirements, the decision can be narrowed down to two key variables -- Latency and Batch Size. How much latency or data lag can the business requirement endure? Do you need the data to be synchronized in hours, minutes, seconds, or sub-seconds? Now take that information and cross reference it against the size of the loads needed to meet the requirement. Are the loads small, trickle feeds or are they large batches/bulk loads? Answering these questions around Latency and Batch Size, leads you to the right synchronization method and ultimately the right synchronization tool.
The graphic below details when each method/tool should be leveraged given a certain Latency and Batch Size requirement:
Table Copy operations are designed to support periodic loads and ad-hoc data movement between two different Teradata systems. The batch sizes for Table Copy range from small loads (mini-batch) to large bulk loads. Latency requirements for Table Copy operations are typically less critical than that of SQL Multicast. If hourly, nightly, weekly, or even monthly synchronization is required, then Table Copy might be a good option to consider. Table Copy is flexible in that it can synchronize both full and partial tables. The Teradata product that supports this method of synchronization is Unity Data Mover. Common use cases for Unity Data Mover include moving data between two production environments or data movement between a production and test/development environment.
SQL Multicast is a near real-time synchronization method to support mission critical applications and data. This method synchronizes data across multiple Teradata systems by dynamically sending or broadcasting SQL statements to multiple systems in different locations or data centers. A very different approach from that of table copy! With this method, SQL statements are applied to multiple target systems, based on location of the SQL objects. SQL Multicast will apply writes to as many Teradata systems as the user would like to keep in sync. In the event of an outage, this approach will automatically capture and log any updates being applied during the time a system is unavailable. Upon the system recovering and becoming available again, this approach will dynamically resynchronize the system and catch it back up to the other available systems (replay all updates not already applied). The approach makes any outage completely transparent to the user or application. Failover is handled automatically. The associated Teradata products that leverage SQL Multicast are Unity Director and Unity Loader.
One's investment in either data synchronization methodology is safe. These methods and associated products are designed to work together and co-exist within a given environment. It is understood that Table Copy or SQL Multicast, by themselves, will not meet all synchronization needs. Yet together, they address different use cases and different requirements that span the entire business environment.