This “Teradata Basics” posting describes the current dimensions of parallelism in the Teradata Database, and how they work together.

What is Query Parallelism?

Executing a single SQL statement in parallel means breaking the request into components, and working on all components at the same time, with one single answer delivered. Parallel execution can incorporate all or part of the operations within a query, and can significantly reduce the response time of an SQL statement, particularly if the query reads and analyzes a large amount of data.

With a design goal of eliminating single-threaded operations, the original architects of the Teradata Database parallelized everything, from the entry of SQL statements to the smallest detail of their execution. The database’s entire foundation was constructed around the idea of giving each component in the system many look-alike counterparts. Not knowing where the future bottlenecks might spring up, early developers weeded out all possible single points of control and effectively eliminated the conditions that breed gridlock in a system.

Limitless interconnect pathways, and multiple optimizers, host channel connections, gateways, and units of parallelism are supported in Teradata, increasing flexibility and control over performance that is crucial to large-scale data analytics today.

Teradata’s Unit of Parallelism

As you probably know, Teradata’s basic unit of parallelism is the AMP. From system configuration time forward all queries, data loads, backups, index builds, in fact everything that happens in a Teradata system, is shared across a pre-defined number of AMPs. The parallelism is predictable and understandable.

Each AMP acts like a microcosm of the database, supporting such things as data loading, reading, writing, journaling and recovery for all the data that it owns. The parallel units also have knowledge of each other and work cooperatively together behind the scenes.  This teamwork among parallel units is an unusual strength of the Teradata Database, driving higher performance with minimal overhead.

Teradata’s Dimensions of Query Parallelism

While the AMP is the fundamental unit of apportionment, and delivers basic query parallelism to the work in the system, there are two additional parallel dimensions woven into the Teradata Database, specifically for query performance. These are referred to here as “within-a-step” parallelism, and “multi-step” parallelism. A description of all three dimensions of parallelism that Teradata applies to a query follows:

Query Parallelism. Query parallelism is usually enabled in Teradata by hash-partitioning the data across all the AMPs defined in the system.  One exception is No Primary Index tables, which use other mechanisms for assigning data to AMPs.  Once data is assigned to an AMP, the AMP provides all the database services on its allocation of data blocks. All relational operations such as table scans, index scans, projections, selections, joins, aggregations, and sorts execute in parallel across the AMPs simultaneously. Each operation is performed on an AMP’s data independently of the data associated with the other AMPs.

Within-a-Step Parallelism. A second dimension of parallelism that will naturally unfold during query execution is an overlapping of selected database operations referred to here as within-a-step parallelism. The optimizer splits an SQL query into a small number of high level database operations called “steps” and dispatches these distinct steps for execution to the AMPs, one after another.

A step can be a small piece or a large chunk of work.  It can be simple, such as "scan a table and return the result" or complex, such as "scan two tables and apply predicates to each, join the two tables, redistribute the join result on specified columns, sort the redistributed rows, and place the redistributed rows in an intermediate table".

Within each of these potentially large chunks of work that we call steps, multiple relational operations can be processed in parallel by pipelining.  While a table scan is taking place, rows that are selected can be pipelined into a join process immediately.  Pipelining is the ability to begin one task before its predecessor task has completed and will take place whenever possible within each distinct step.

This dynamic execution technique, in which a second operation jumps off of a first one to perform portions of the step in parallel, is key to increasing the basic query parallelism. The relational-operator mix of a step is carefully chosen by the Teradata optimizer to avoid stalls within the pipeline.

Multi-Step Parallelism. Multi-step parallelism is enabled by executing multiple “steps” of a query simultaneously, across all the participating units of parallelism in the system. One or more tasks are invoked for each step on each AMP to perform the actual database operation. Multiple steps for the same query can be executing at the same time to the extent that they are not dependent on results of previous steps.

Below is a representation of how all of these three types of parallelism might appear in a query’s execution.

The above figure shows four AMPs supporting a single query’s execution, and the query has been optimized into 7 steps.  Step 1.2 and Step 2.2 demonstrate within-a-step parallelism, where two different tables are scanned and joined together (three different operations are performed).  The result of those three operations are pipelined into a sort and then a redistribution, all in one step. Step 1.1 and 1.2 together (as well as 2.1 and 2.2 together) demonstrate multi-step parallelism, as two distinct steps are chosen to execute at the same time, within each AMP.

And Even More Parallel Possibilities

In addition to the three dimensions of parallelism shown above, the Teradata Database offers an SQL extension called a “multi-statement request” that allows several distinct statements to be bundled together and sent from the client to the database as if they were one. These SQL statements will then be executed in parallel.

When the multi-statement request feature is used, any sub-expressions that the different SQL statements have in common will be executed once and the results shared among them.  Known as “common sub-expression elimination,” this means that if six select statements were bundled together and all contained the same subquery, that subquery would only be executed once. Even though these SQL statements are executed in an inter-dependent, overlapping fashion, each query in a multi-statement request will return its own distinct answer set.

This multi-faceted parallelism is not easy to choreograph unless it is planned for in the early stages of product evolution. An optimizer that generates three dimensions of parallelism for one query such as described here must be intimately familiar with all the parallel capabilities that are available and know how and when to use them.  But most importantly, the Teradata Database applies these multiple dimensions of parallelism automatically, without user intervention or special setup.

Discussion
mukuljain015 1 comment Joined 03/15
01 Sep 2015

Hey Carrie,
Thanks for putting useful contents in the blog.
my question is: how to disable throttle on TD-14.10 sles-11 System. I'm trying to deactivate throttle by several means but it's not working. I'm checking the throttle status on the system by typing the commands " tdwmdmp" and "tdwmdmp -a | grep HCATA ". It's demonstrating that the throttle is active on the system. Can you please assist me?
 
Thanks and Regards
Mukul Jain 

carrie 595 comments Joined 04/08
03 Sep 2015

Mukal,
 
Could I ask in the future that you post questions and/or comments under blog postings that are relevant to the topic you are asking about?   It can get confusing to anyone reading a blog when there are comments such as this one that are unrelated to the blog posting itself.   
 
To answer your question, if this is a system throttle, go to the Viewpoint Workload Designer throttles screen and uncheck the "enable" button.  Save the ruleset, then Activate the ruleset.
 
 If this is a workload throttle, go to the workload screen and select the throttle tab and for each planned environment make the concurrency limit "unlimited" and that will essentially disable the throttle.  Save changes and activate the rule set.
 
There are other ways to manage throttle activity and limits using APIs, but I am not familiar with those.  You could post a question on various Teradata Forums and ask if anyone has used those APIs to disable/enable throttles and how they used them.
 
However, using APIs, such as TDWMRuleControl,  could also put Viewpoint reporting out of sync with the rule state in the database, so you have to be very careful when using them.
 
If the usual way of disabling throttles is not working for you, I would suggest you open an incident with the support center.
 
Thanks, -Carrie

LUCAS 17 comments Joined 06/09
11 Feb 2016

Hi Carrie,
late reading but learning much, thank you for this article.
About multi-statement you said "any sub-expressions that the different SQL statements have in common will be executed once and the results shared among them (AMPs)".
How to consider the impact of a single execution and common result shared among AMPs in term of AWT consumption ? an always positive impact because of less work, a possible negative impact in some situations when a lot of AMPs are waiting for the results ?
Is there a guideline to evaluate the impact of a multistatement, as a 15 "select ... join from table1, table2 ... ;select ... join from table1, table2 ... ; ...." ?
Pierre
 

carrie 595 comments Joined 04/08
12 Feb 2016

Hi Pierre,
 
I am concerned that you may have misiinterpreted one of the details in the posting.
 
This statement that you quote:  "any sub-expressions that the different SQL statements have in common will be executed once and the results shared among them (AMPs)" does not mean "shared among AMPs"  but "shared among the different statements within the multi-statement request".   I should have stated that more explicitly.  
 
If two or more of the statements in the request contain the same subquery, instead of executing the sub-query two or more times (1 per statement that contains it), the subquery is only executed once and it's spool is re-used in all the statements that need it.  So in this case the multi-statement request does slightly less work than if each statement were executed serially.
 
In terms of AWT usage, the multi-statement request will be broken down into individual steps just like any query, so while it may run longer than any single-statement request, it shouldn't use any more AWTs at any point in time than a serial execution of all statements would.  The only exception is if for some reason the multi-statement request plan had more parallel steps.    Under normal circumstances a request is not allowed to use more than 4 AWTs concurrently, whether it is a single statement request or a multi-statement request.   So I don't expect more stress on AWTs by using multi-statement requests.
 
Thanks, -Carrie

preyesdiaz 1 comment Joined 04/16
27 Apr 2016

I am not sure if this is the right place to post this questin , but as is tittled "Basics", here I go.
 
I need a very simple definition of what a semantic view is. The target population is non-technical users. can anybody give me a hand?
 
Thanks

carrie 595 comments Joined 04/08
28 Apr 2016

Teradata Forum has a category called "Data Modeling" which describes itself as:   Discussion topics on Logical Data Models, Physical Data Models, and Semantic Modeling Building Blocks. 
 
You can find that discussion group at:  http://forums.teradata.com/forum/data-modeling
 
I'd suggest you ask your question there.  Sorry, but  I am not able to help you with this.
 
Thanks, -Carrie

You must sign in to leave a comment.