A Day in the Life of a Unity Data Mover Job
Unity Data Mover (DM) is built to handle multiple requests to move objects concurrently. Understanding how DM processes these requests is benefitial to those looking to get the most out of their DM installations.
The easiest way to explain how DM handles work is to take a user request to move data and follow it through its life cycle. Along the way, we'll see how various user controlled settings affect the request and provide suggestions for how one might improve work throughput.
Tip: Look for special Tips below as well as for green boxes highlighting user controllable settings.
Phase 1: Submitting a Request to Move Data
Requests to move data are submitted through either the DM Viewpoint Portlet or through the DM Command Line Interface. The DM Viewpoint Portlet provides a dynamic graphical interface wherein users are visually aided in crafting their request. The DM Command Line Interface provides two commands (create and move) where users can submit their request: either directly on the command line or in XML format. In either case, the user's request is captured by the interface and sent to the DM Daemon for processing using JMS.
The user's request to move data is passed from the interface to the DM Daemon in a create job request. DM is designed to handle multiple create job requests at the same time however there are limitations. The create job requests shares a queue on the JMS bus with all other incoming requests including requests to start jobs (which we'll get to in a minute), requests to show a list of existing jobs, requests to show job status, requests to edit configuration information, etc. The DM Daemon has a thread pool dedicated to consuming these messages from the queue. However, this JMS thread pool has a limited number of threads available so requests are subject to a wait time if lots of requests are being submitted around the same time. There is currently no method for users to modify DM's capacity in this area. If users are expected a high volume of requests, they are encourage to consider throttling the request submission rate to avoid a traffic jam in this area.
Phase 2: Request Becomes a Job
Once the DM Daemon receives the request, the request is passed off from a thread in the DM Daemon's JMS thread pool to another internal pool of threads referred to as the Command Thread Pool. In this case, the thread from the Command Thread Pool is responsible for taking the create job request and using that to create a job. The Command Thread Pool is limited in size (currently 20) so the create job request may have to wait here until a Command Thread Pool thread becomes available. There is no current method for the user to modify the size of the Command Thread Pool.
Internally, creating a DM job involves the following:
- Storing the original user provided job intent in the DM repository for future reference. With a few exceptions, this user intent will be referenced every time the job is run to make sure that DM is performing the work that the user wants.
- Querying metadata information from the source and target database. This information is required for DM to create a job that will accomplish the requested work properly and in the most efficient manner possible.
- Crafting the steps needed to perform the work into a job plan.
The job plan is crafted in the following manner:
- The work need to perform the requested data move is separated into a series of job steps. Job steps are executed sequentially with the requirement that all work must be successfully completed for the current job step before moving onto the next. This allows units of work that have a dependency on other work being done first to be executed in proper order.
- The work assigned to each job step is in turn translated into one or more job tasks. Job tasks contain work that is to be done by a specific utility (JDBC, ARC, or TPTAPI). All job tasks in a job step is safe to execute in parallel and will be executed in that manner if the proper settings and resources are available.
Phase 3: Job Submitted for Execution
Once the create job request is completed, the newly created DM job is executed by a start job request. The DM portlet separates the create job request and start job request by requiring the user to initiate each of these actions independently. With the DM Command Line Interface, users can similarly send the requests separately using the create and start commands. Additionally, the DM Command Line Interface's move command provides a method for having the start job request sent automatically as soon as the create job request finishes. In all cases, under the covers, the start job request is sent separately and is subject to the same JMS queue limitations as the original create job request. The start job request must wait for a DM Daemon JMS processing thread to become available to consume the message and then it must wait for an available thread from the Command Thread Pool.
For start requests, the thread from the Command Thread Pool quickly puts the request in a waiting list and exits back to the pool. The waiting list provides a method to keep track of large numbers of start requests with minimal memory commitment. Freeing up the thread from the Command Thread Pool at this point also allows the DM Daemon to quickly process large number of incoming start requests. An internal managing thread periodically awakens and checks to see if there are start requests in the waiting list. If so, the managing thread then checks to see if the start request can be passed on the next process.
From the waiting list, the job will either be directly executed or staged and placed in the job queue. The DM Daemon's max concurrent job limit prevents too many DM jobs from running at the same time. This limit is set in the DM Daemon's properties file or through the DM Admin Viewpoint Portlet and controls how many DM jobs are allowed to run at any given time. If the current running job count is below the max concurrent job limit, then the job will be directly executed from the waiting list.
If the max concurrent job limit has been reached, the next option is to put the job in the job queue. Jobs in the job queue are fully staged and ready to run which requires a large amount of memory. Having too many jobs at this stage can lead to memory issues so the number of jobs that can be in the job queue is limited by the DM Daemon's max queued jobs limit. Similar to the DM Daemon's max concurrent job limit, the max queued job limit is set in the DM Daemon's properties file or through the DM Admin Viewpoint Portlet. If the current queued job count is below the max queued job limit, the start request will be taken from the waiting list, staged, and put on the job queue. If the max queued job limit has been reached, then the start request must remain in the waiting list until there is room.
One would think that increasing the max concurrent job limit would increase DM's job through put but this is not always the case. Increasing the max concurrent job limit only allows more jobs to become runnable, it does not mean work for those jobs will actually be performed right away. As we'll see in the next section, the work for a job is broken into tasks and those tasks require room on the DM Agent to run. When a job moves to the "RUNNING" state, its tasks become eligible to be executed however they must wait in line with the tasks from all other running jobs before they can actually be executed on the DM Agent. If the DM Agent is already running at capacity, then increasing the max concurrent job limit will only increase the memory usage on the DM Daemon as each running job comes with a certain amount of memory overhead.
Tip: To check if increasing the max concurrent job limit would be helpful, run the "list_agents" command several times during your peak job submission period. If the DM Agent(s) report they aren't consistently running at the max capacity, then increasing the max concurrent job limit might provide a benefit. Otherwise, if the DM Agent(s) are consistently running at max capacity, its better to avoid the extra memory overhead.
Tip: To check if increasing the max queued job limit would be helpful, monitor the number of jobs in the "RUNNING" state during your peak job submission period. If the "RUNNING" jobs count is consistently below the max concurrent job limit but there are plenty of jobs in the "QUEUED" state, then increasing the max queued job limit might provide a benefit as moving jobs from the internal waiting list to the "RUNNING" state takes longer than executing jobs from the job queue. However, if the "RUNNING" jobs count is consistently at the max concurrent job limit, it is best not to increase the max queued job limit as having too many jobs in the job queue can lead to memory issues.
Phase 4: Job Executed
When it comes time to finally execute the job, the job plan is loaded and executed step by step. As previously stated, each job step contains one or more job tasks. Job tasks are units of work to be performed by one of DM's underlying utilities (ARC, TPTAPI, JDBC). When a job step is executed, all of its job tasks are submitted for execution.
Before a job task can be executed, it must first pass through a series of queues to ensure that the proper resources are available to do the work. Resources required include:
- Agent task slot. Each available DM Agent has a limited number of spots available to execute job tasks. This limit is controlled by the DM Agent's max concurrent job limit (it's misnomer - should have been labeled max concurrent task limit) which can be modified by the user in the DM Agent's property file.
- Target load slot. TPTAPI tasks using Load and Update are subject to the target load slot limit which limits the number of this types of tasks that can run concurrently against the same target database system. This limit can be modified by the user via the DM Command Line Interface's edit_configuration and save_configuration commands or via the DM Admin Viewpoint Portlet.
- User from target user pool. When the target user pool feature is in use, ARC tasks are required to obtain a unique user from this pool before they can be executed. If lots of DM jobs using ARC are running against the same target database system concurrently, this can become a bottleneck. Users can bypass by ensuring sufficient number of available users are in the target user pool.
Tip: When looking to improve DM's overall job throughput, play with the DM Agent's max concurrent job limit to find your setup's sweet spot. Increasing this limit allows more tasks to be executed concurrently which in turn can speed up individual job execution if a job has lots of tasks or allow more jobs to be worked on at the same time. This can in theory allow you to make the most of your network and database performance. However, allowing too many tasks to run at the same time can also lower individual task performance. Use the "list_agents" command to check to see if you are hitting the DM Agent max concurrent job limit during your peak job submission rate. If so, note the performance of your jobs, then increase the limit and see if this results in an improvement.
Tip: Set the "target.system.load.slots" limit to be slightly below the Teradata Database's max load limit. When the Teradata Database's actual max load limit is it, the database makes the utility job wait until there is capacity to run again. This delay is not efficiently performed. In-house testing has shown it is better to let DM uphold the limit rather than actually hitting the database's limit.
Phase 5: Job Completed
When all job tasks from all job steps are completed, the DM job is done.