Have you ever had what looks like a great query plan turn into life in the slow lane at run time?

You may have blamed it on your table having very skewed values in the join column. And you were probably right. But the optimizer is getting better at recognizing skew, starting in Teradata 12. This skew detection happens WHILE it’s building up the query plan, not after the query begins to execute. One initial step towards this but-what-will-it-be-like-at-run-time smarts happens when the optimizer is pondering a row redistribution step for a large table in preparation for a join. I thought you might like to see an illustration of how this works.

By identifying skew before the row redistribution is already nailed into the plan, the optimizer has a chance to consider other join options. When skew is detected in the join column of a table-to-be-redistributed, the estimated cost of row redistribution will be adjusted upwards, causing the optimizer to see it as a more expensive option (and let’s face it, with skew present, it may be).

I’ve played around with this on my Teradata 12 system, and like how it works. So if you want to peek over my shoulder for a moment, I’ll share my findings with you.

Yes, You’ll Need Statistics Collected on that Skewed Join Column

Each detailed interval in the statistics histogram has a field called Mode Frequency. Mode frequency is a count of the most popular value in that interval. In interval zero (which contains mostly summary information), the mode frequency field represents a count of the most popular value across all the detail intervals. It is the most highly-skewed value in the table.

Here’s a subset of columns and a few intervals from a histogram for a column. This histogram was built after I updated the table in a way that resulted in about 5 million rows sharing the same value in this column. In other words, I consciously introduced exaggerated skew into the mix. 


 

Interval 1 (the 2nd row) contains the most frequent value for this column, 5. There are 5,001,159 rows in this table that carry the value 5 in this column, out of about 45 million rows in the table. The non-skewed values all have 19 or 20 rows per value.

Because 5 is the most frequent value for this column in the table, it ends up in the Mode Value and its row count in the Mode Frequency column of Interval 0 (the 1st row). Interval 0’s mode frequency is referred to as the “high mode frequency” because it represents the most frequent value across the entire table.

The histogram information is used to calculate the highest number of rows that are likely to go to one AMP for the redistribution and join activity. The optimizer uses the number of AMPs (40 in this case) and this high mode frequency to make a worst-case assessment of the cost of row redistribution.

The Optimizer Rethinks it Estimates in the Face of Skew

Without this new skew awareness, the optimizer in this case would have assumed that each AMP will do the work of joining about 1.1 million rows (44,991,725 rows / 40 AMPs). However, with this new intelligence, the optimizer understands that all the rows that carry the high mode frequency value will be redistributed to one AMP, and that one AMP will be doing a disproportionate amount of the work.

Once it assesses what the worst-case AMP is likely to be doing, then the optimizer further assumes that the remaining rows will be split evenly across all AMPs. Let’s break that math apart.

Considering skew, the worst-case AMP will get 5,001,159 rows (the high mode frequency count). That same AMP will also get an additional 999,764 rows (the number of rows that carry the non-skewed values divided by the number of AMPs – 39,990,566 / 40 = 999,764). This results in an estimate of slightly over 6 million rows targeting the worst-case AMP, more than 5 times larger than what the optimizer estimates would have been in the blind-to-skew case.

Instead of assuming each AMP will receive 1.1 millions rows during row redistribution, the Teradata 12 optimizer assumes a worst-case redistribution of 6 million rows for at least one AMP. It uses that worst-case number of rows to adjust the estimated cost of the row redistribution choice.

My Experiment

So here’s what I did, graphically shown below.  I started with a baseline join between the Lineitem and the Parttbl tables where there was no skew on L_Partkey (the larger table’s join column). This particular join required that both the Lineitem and the Parttbl be redistributed, as in my database neither side of the join was the table’s primary index column. 

Next, I changed the underlying values contained in the Lineitem table join column, L_Partkey, to gradually introduce skew. After each such skew-increasing effort, I made sure to recollect statistics on L_Partkey, so the optimizer would have the information needed to detect the creeping skew. Then I ran an explain to see what plan I got after increasing skew and collecting stats, hoping to see a change in join geography, that would avoid a redistribution of the Lineitem table. 

The following table shows my results from this experiment, where I went from no skew (high mode frequency of 118 rows) up to moderately strong skew (high mode frequency of over 100,000 rows). In this table I capture the high mode frequency values taken from the histogram for each test (remember, I recollected stats after each change in the skew). I also capture the estimated processing time that showed up for the row redistribution step in each query plan. 

As can be seen in the table above, the join geography changed when the high mode frequency reached over 100,000 rows for L_Partkey. Seeing those changed demographics, the optimizer chose to leave the large table local, and duplicate the smaller table.

Notice that the estimated processing time taken from the explain text increases a small amount as skew is gradually introduced, even though the join geography in the plan does not change for the small skew and the smallish skew cases. That increase in the estimated processing time is evidence that skew detection is working. The estimated cost of performing the row redistribution increases gradually, until such time as the duplication of the smaller table becomes a more cost-effective choice.

Conclusion

Of course, optimizer decisions such as changing join geography depend on many factors, including processing power of the platform, number of rows in the tables on each side of the join, and other demographic information available to the optimizer. And the costing performed by the optimizer is always taken down to the AMP level, so the number of AMPs in the configuration can make a difference in the plan you get. This is a simple example to step you through the process.

Just a reminder, if there are multiple join predicates, multi-column statistics may be required.

What I have demonstrated to you is that skew-awareness is alive and well in the Teradata optimizer, and it’s here to stay. So your life as a Teradata DBA just got a little more boring.
 

Discussion
GeorgeXiong 5 comments Joined 07/09
14 Jul 2009

in such case, as you reminder, whether the smaller table duplicated depend on many factors. sometimes the optimizer will not duplicate the smaller table even this will be much lower cost consumption.

Any "tricky" way to force the smaller table duplicate?

carrie 595 comments Joined 04/08
17 Jul 2009

I would not suggest that you attempt to force table duplication. It is better to leave join geography and join sequence decisions to the optimizer. But what you can do is make sure that you have provided adequate statistics information, so the best join plans can be produced.

With small tables in particular, it is critical that statistics be thoroughly collected and kept current. Random AMP sampling does not provide as good default statistics for very small tables as it does for moderate or larger tables. If you are not getting table duplication when you think you should, then examine what statistics you are collecting on the table, and make sure you have covered all columns on the table that might be contributing to a better table estimate. The optimizers costing algorithms rely on those statistics in making the decision to duplicate or not duplicate a table.

Quonos 1 comment Joined 09/10
20 Sep 2010

I am having very similar issue, the bigger table with more than 250 million rows and tiny table with about 2 k rows, optimiser wants to redistibute both tables :( Problem is I do not have controll over the big table and cannot collect stats on joined column. I was wondering if you could suggest something that would force duplication of smaller table? I know it would be better to leave it with the optimiser but in this case its generates really bad explain.

Chusa01 2 comments Joined 05/09
20 Sep 2010

How can i see the statistical column histogram data that you have shown above? I'm guessing it's somewhere in the dbc -- but i have no idea where.

Also, how has skew awareness advanced in Teradata 13? (& 13.10?)

What are the best ways to identify what's causing a query to spool out? I'm having a challenge now with a large query that ran fine last month, but now only spools out. if you give it enought spool (2 terabyes), it will run but will take an extremely long time. I'm struggling trying to figure out what specific piece (or pieces) of the data have changed and are causing the issue. Additionally, i don't have a previous copy (or history) of the data, so... it's difficult to see what's changed. Do you have any ideas or pointers that you can offer?

many thanks in advance.

carrie 595 comments Joined 04/08
21 Sep 2010

In response to Quonos...

I don't know of any method of forcing a table duplication plan that is different from what the optimizer produces. The key demographics used by the optimizer come from collected statistics. Not having statistics collected on the join column is most likely the source of the bad plan, so it's pretty important to get that addressed at your site. If you don't have the priveleges, then find out who does and have them collect the stats. Or they can put the collect statistics statement for the join column in a macro , and then give you exec privileges on the macro.

In some cases, redistributing the large table makes the join operation run quicker. If the small table is duplicated without touching the large table, only product/dynamic hash joins are possible which may not be the optimal join for this query. All the operations involved (not just the redistribution) are costed to make the final decision. The accuracy of the decision depends on the estimates the optimizer makes, and the optimizer relies on appropriate statistics to make good estimates. In addition, as the posting explains, skew detection relies on statistics as well.

Thanks, -Carrie

carrie 595 comments Joined 04/08
21 Sep 2010

In response to Chusa01...

To see the detail inside the histogram you issue this command:

HELP STATISTICS table-name COLUMN column-name;

You must have some access right (any access right) on the table in order to view the histogram detail.

There are no major changes to skew awareness in T13 or in T13.10.

In terms of what is causing spooling aborts, it could be that the data volume has increased, causing more rows to be selected on tables that feed into the large spool file, or more row are being joined before the spool file has been created. It is also possible that selection criteria on some of the tables feeding the spool file have changed; for example, a user is selecting a larger range of dates, or maybe is doing analysis on the entire year of 2010 and since it's near the end of the year, there's just more data involved in performing that analysis.

It is also possible that selection criteria is selecting data values that are producing more skew. The spool allocation for a user is divided by the number of AMPs in the configuration. If any one AMP exceeds its portion of the spool allocation, the query will get an error. Skew is one of the biggest reasons that queries run out of spool.

If you aren’t doing it now, it won’t help you with this query but in the future, collecting DBQL or EXPLAIN information on non-tactical queries would provide you with the opportunity to go look at previous query plans and compare them to the current query plan. This may provide a clue to identifying the problem.

Thanks, -Carrie

10 Apr 2012

Hi Carrie,
I have a query skew of 500.The step which is taking all cpu time has join between two tables(PI column join).The tables are very well distributed.(almost 0 skewness).Stats are not updated,yes there is high gap in estimated row count and actual row count in dbqlstep table for this step.
But data is not redistributed or duplicated.How in this situation also stas colect help in query skewness.

Meenakshi Swami

carrie 595 comments Joined 04/08
11 Apr 2012

Without stats collected on the join columns, the optimizer assumes that the table with a small number of distinct values on the join columns will have a complete match. If there is not a complete match to that table, then the estimated rows could be quite different from the actual rows in that step.

Another reason there could be a gap between actual and estimated has to do with the WHERE clause selection that the query might specify for each table. The optimizer will have to make a decision whether to apply the selection criteria before or after the join. Without statistics collected on the join column, and also on the selection columns, this costing may be inaccurate, and a less expensive join order might be missed.

For those reasons, it is probably a good idea to collect statistics on the join columns in this case. The more information the optimizer has, the better it can help to find a better plan.

12 Apr 2012

Thanks Carrie...but how to justify skewness because of missing stats...i got it that its good to have stas on join columns(even for PI join) but how missing stas contributes to query skew...still now very clear..could you please help...

Meenakshi Swami

carrie 595 comments Joined 04/08
16 Apr 2012

I cannot tell you for sure that missing stats are contributing to skewed processing you are experiencing. Under some circumstances more fully collecting statistics provides the optimizer with more information, which may allow for a more optimal plan to be selected.

In the example I illustrated above, additional statistics allowed the optimizer to avoid a skewed redistribution. That is one example. You could try more complete statistics collections and see if the skew is reduced.

Thanks, -Carrie

raghu.juvvadi 1 comment Joined 02/08
13 May 2012

Hi Carrie,

First, I want to say my thanks for providing detailed and very useful info which makes our jobs lot easier. Your blog is a gold mine

I have a question related to skew. Are there any alerting mechanisms that notify if a large chink of data is getting redistributed? This gives us a chance to visit the query, modify the indexes if required based on the query join pattern or rewrite the query , setup the stats etc.

Also, we have defined workloads in TASM and allocated high priority to critical apps. THough we allocated very less priority to adhoc workload, our critical apps are stll getting impacted by adhoc workload.I see that it is not isue with CPU allocation. It is the adhoc queries conuming lot of delta IO that I noticed. Is there a way we can control the IO consumption too? We greatly appreciate your help

Thanks in advance.

carrie 595 comments Joined 04/08
15 May 2012

Thank you for your kind words!

As you probably already know, you can always look at explain text after a query runs (or before ), seeking out steps with large row redistributions. DBQLStepTbl output can indicate if there was skew going on during that step, and how bad it was. Although the problem you are describing might not show up in the plan

The closest thing I can think of that might help you while the query is running is to use a workload exception on the workload where these types of queries run. If you create a workload exception on skew, skew will then be calculated for that query at the end of each exception interval, and it will consider CPU and I/O consumed only during that interval. If skew is detected it must persist for the qualification CPU time specified.

If the exception catches skew on a query, you can choose the action you want to take place as a result. Typical action in a case like this is to notify the DBA or do some kind of alert. However, you could also consider moving such a query to a low-weighted workload with a low CPU limit, which would allow it to complete, but not to consume very much CPU while it is doing so.

There is no way you can place a limit on I/O consumption currently. Sometimes placing a CPU limit on an allocation group that is a high I/O-consumer can act to slow down I/O-consumption as a second-hand effect.

Thanks, -Carrie

asimibm 2 comments Joined 05/13
23 May 2013

Hi Carrie,
This post was amazing. I am in a trouble and need your help. Below is the description of my problem. The description is little big, kindly don't ignore it. 
There is a table TAB1, and have COL1 (a good distributed non unique column having no null values) and COL2 ( a skewed column, skewed as 75% value are NULL) apart from other columns.None of the columns COL1 and COL2 are primary index. I am writing the below SQL:-
SELECT TAB1.X,TAB1.Y,TAB1.Z
FROM TAB1 A LEFT OUTER JOIN TAB1 B ON A.COL1=B.COL2
WHERE A.COL3  BETWEEN DATE-7 and DATE;
In case of TD12
Optimizer is doing an ALL AMP retrive from A (TAB1) with the residual condition COL3 BETWEEN DATE-7 and DATE, then data redistribution as per COL1 and then sorting on the hash value. Simillary as a parallel step ALL AMP retrive from B (TAB1) with residual condition COL2 is NOT NULL and then data redistribution as per COL2 and then sorting on the hash value.
In the next step it is doing a MERGE join and giving the result.
In case of TD14
Optimizer is doing ALL AMP retrive from A (TAB1) with the residual condition COL3 BETWEEN DATE-7 and DATE, and fanned out some nos of hash join partition and then doing redistribution as per COL1. In this same step as a parallel activity it is doing all AMP retrive from B (TAB1) with no residual condition and fanned out to some nos of hash join partition, and then doing data distribution  as per COL2.
In the next step it is doing HASH join with the COL1=COL2 condition and COL2 is NOT NULL.
My Question and Analysis
The stats and data volume is same in both the TD12 and TD14 server. As per the join plan it is quite clear that Teradata is not picking the skewness which is based on the NULL value. Other wise it should not do the data redistribution in TD12 and TD14. I checked the histogram in TD12 and DBC.COLUMNSTATSV in TD14 and found that high mode frequency is not considering the NULL value. It has the second highest count of the COL2 (NULL has the highest count) for the interval 0. It also again can be proved by putting COALESCE in the joining condition and making NULL to 0 (zero). Once I put the COALESCE I found the optimzer is behaving as described by you, it has stopped redistributing the rows on joining column. Now my question is this a known thing, and TD optimzer will not consider the NULL skewness.
If you see the explain plan in TD12, it is doing the AMP retive with a residual condition, as COL2 is NOT NULL though I have not put any where condition. I guess optimizer is doing perfect thing by removing the NULL values before the redistribution. But the same step in TD14 is done after as a join condition, and no residual removal is done. This is causing our end user spool space issue in the data redistribution step. The data is skewed, all the NULL are going in one AMP, so they are complaining on spool space on our new TD14 box, as the same query was working in TD12 box.
I have done more experiments in TD14 box, by putting "DIAGNOSTIC NOHASHJOIN ON FOR SESSION". Now the tables are joined using MERGE join but the residual condition is not picked up. Again we got spool space.
I have tried another method by cheating optimizer. I have dropped the STATs of the COL2 (which is the skewed column), and now optimizer is picking the residual condition and there is no more spool space. But I know this is not the correct method. 
Please advice and help me out and help me understanding why TD14 is not picking the where condition as a residual condition while TD12 is taking this.
Thanks in advance, and awaiting your light in this dark room.

I am a big fan of TD Optimizer.....

carrie 595 comments Joined 04/08
24 May 2013

If you are getting a worse plan for this query after upgrading to 14.0, and it looks like you are, I suggest you open an incident with the support center.
 
Thanks, -Carrie

asimibm 2 comments Joined 05/13
26 May 2013

Hi Carrie,
 
We have already contacted support centre, but they are asking for the increase the spool space. I am finding that after collecting the stats, the skewness sensitivty has decreased a lot. I am also finding this is happening, only on the joining case in which a table has lot of NULL value on the joining column and other table have no NULL value on that joining column. Can you please guess some thing, that will help us a lot.
Thanks a lot.....

I am a big fan of TD Optimizer.....

carrie 595 comments Joined 04/08
30 May 2013

Please continue to work with the support center on this issue.  They are the preferred point of contact if you believe you are experiencing a product deficiency.  I am not able to provide that level of support or expertise.
 
Thanks, -Carrie

bhavesh_09 1 comment Joined 06/12
17 Oct 2013

Hi Carrie,
Your post is really informative.
I have one question. If multiple stats are available on table, then how PE chooses which stats to consider for generating query plan?
e.g. for employee table, it is possible to have stats on Emp_id, emp_name, partition, (partition, Join_date) etc. and it is also possible that few stats are stale. so which stat will teradata choose to generate efficient plan or teradata uses all these stats to come on conclusion?
-Bhavesh

carrie 595 comments Joined 04/08
18 Oct 2013

Bhavesh,
 
The optimizer looks at the query and sees what columns are used for selection criteria and for joins, and then attempts to find statistics for just those columns.  If it finds statistics, it uses them.  It will ignore any other statistics that are collected on the table for columns that are not referenced in the query.
 
If the statistics it seeks are stale, the optimizer will attempt to extrapolate what the statistics should look like based on current row counts and other saved information, and use the results of the extrapolation as input into the plan creation.
 
In your example below, if you have a query with a WHERE clause "...where emp_name like 'K%;"   then the optimizer will look for statistics on emp_name, but will not look for statistics on emp_ID.
 
There are several other blog postings from me on statistics collection and statistics recommendations which can provide more information.
 
Thanks, -Carrie

vasudev 35 comments Joined 12/12
10 Nov 2013

Hi Carrie,
I am using TD12 and going to add 2 new columns in a big table of 200 million rows. Since having enough space, planning to execute it using insert select. Going to create a new table with additional two columns and using insert select the new table is going to be populated. In the explain plan i saw this
We do an all-AMPs RETRIEVE step from
     dbname.Table name by way of an all-rows scan
     with no residual conditions into Spool 1 (all_amps) (compressed
     columns allowed), which is built locally on the AMPs.  The input
     table will not be cached in memory, but it is eligible for
     synchronized scanning.  The result spool file will not be cached
     in memory.  The size of Spool 1 is estimated with high confidence
     to be 46,278,508,639 rows (4,118,787,268,871 bytes).  The
     estimated time for this step is 40 minutes and 5 seconds. 
 
Here what is the benefit i am going to gain. In explain plan i can see not cached into memory. I am not able to understand what benefit i i will get because of this.In the performance management book its mentioned that loading empty tables using insert select is good since the TJ is not used. Whether this insert select is efficient to CPU or I/O? Please advise.

carrie 595 comments Joined 04/08
13 Nov 2013

Vasudev,
 
Unfortunately, I am not able to provide personalized responses to all the various questions/problems that you have posted as comments across my blog in the last few months (three questions from you over the last several days are still in my inbox).  Many of your questions are unrelated, or only loosely related, to the topic of the blog posting, and some are beyond my area of expertise.  
 
The Developer Exchange is intended for an interchange of ideas from various people in the Teradata community.  Unfortunately, it is not equipped to (and I am not able to) provide the level of support that you are seeking. 
 
There are several alternatives for you to get answers to your questions.  May I suggest that you post specific problems that you experience (such as this one), or questions that you have, on the Teradata Forum, when the questions are not directly tied to the content of a particular blog posting?  Teradata Forum is a question-answer forum designed for practitioners to help each other with their day-to-day problems and questions. 
 
http://forums.teradata.com/forum
 
Formal support channels using Teradata Customer Support can also be pursued when you are facing performance issues.  In addition, Teradata Professional Services are available to come on site to do performance evaluations, make recommendations, and help establish best practices at your site.  There are also multiple training opportunities within the Teradata Learning establishment, including the Teradata Education Network (TEN) series of in depth, expert-led online presentations.  Contact your account team for help in getting started pursuing any of these opportunities.
 
I wish you well working with Teradata going forward.
 
Thanks, -Carrie

vasudev 35 comments Joined 12/12
14 Nov 2013

Hi Carrie,
Thank you very much for your response.
Will post the questions in the particular forum post. 
Again, thanks for your response.

Santanu84 12 comments Joined 04/13
26 Dec 2013

Hi Carrie,

Here at one point you have mentioned, 

 

"Instead of assuming each AMP will receive 1.1 millions rows during row redistribution, the Teradata 12 optimizer assumes a worst-case redistribution of 6 million rows for at least one AMP. It uses that worst-case number of rows to adjust the estimated cost of the row redistribution choice."

 

Assume that TableA PI column is joined with TableB non-PI column. Definitely TableB will be redistributed to all AMP. Now TableB is a very large table. 

 

As per my understanding in v2r6 it would have assumed that each AMP has received those huge rows and processing in each AMP would have been skewed. (performance of all AMP is equivalent to the slowest AMP).

 

Does that not happen any more in TD12 optimizer?

I mean, currently (from TD12 onwards) will the optimizer be able to pin-point the worst case scenario of row redistribution and accordingly change its strategy so that the processing is similar in each AMP?

 

Please do let me know your response.

 

 

Thanking You

Santanu Ghosh

carrie 595 comments Joined 04/08
30 Dec 2013

Santanu,
 
I cannot tell you for sure what the optimizer will choose to do. There are many considerations involved.   And even if adjustments for skew can be made when building the query plan, that does not mean that there will be totally even processing across all AMPs.  
 
However, as the posting above describes, there is new intelligence in the optimizer to attempt to detect skew that should help in many situations in Teradata 12 and above releases.  To see how that applies to specific situations, it is best to run an explain and look at the plan.   That is a very simple and easy way to observe optimizer decisions without running the query itself.
 
Thanks, -Carrie

Santanu84 12 comments Joined 04/13
30 Dec 2013

Thanks Carrie for your reply.
Actually I tried testing from my end and got a specific result. I have already posted my observation in this forum.
http://forums.teradata.com/forum/database/query-on-skew-sensitivity-in-the-td12-above-optimizer#comment-128093
But I have restriction on system availability (as I am using my office's dev env :-D ) because of which could not recreate very high skewness. However my findings suggest that optimizer is smart enough to change the join plan.
In the above link during skew it changed to "table duplicate" and single partition hash join. But it does not mean even processing across all AMPs.
So I was wondering from which perspective optimizer is calculation the cost factor? Anyway my understanding is now clear to some extend.
;-D
 
Thanks
Santanu

carrie 595 comments Joined 04/08
02 Jan 2014

Santanu,
 
I am glad to hear you are getting a join plan that is better.   Unfortunately, I do not know all the internals about how the optimizer does its costing.  It's quite complex.
 
Thanks, -Carrie

rakzrakz 1 comment Joined 03/14
19 Mar 2014

Carrie,
I have a table which has a blob column and has only one data row. This table shows me 93% of skew. I tried index,compress, stats on lob - which are not allowed in TD 13 version i use. My question is it ok to have the table with skew above 90% since it has only one row. Or should I try any ways reducing skew?. Appreciate your help.
Thanks - Rakz

carrie 595 comments Joined 04/08
24 Mar 2014

If you only have one row in your table, then it will appear skewed because only that one AMP will have data.  That is a normal situation. I don't see a problem with that.  There is no way to reduce that skew without deleting the row or adding more rows in the table.    Neither probably makes business sense.   
 
LOBs are stored on the same AMP as the base row they are associated to, so the existence of a LOB column will not either increase or decrease skew if it exists.
 
But if the intent is to have only one row in the table, there is no problem that needs solving here.
 
Thanks, -Carrie

sieger007 1 comment Joined 09/15
23 Oct 2015

<p>
Howdy Carrie
There's not been much water under bridge on this wonderful article so I thought I'd help  break some ice with my situation.
I am joining a Star schema Fact and Dimension tables ( some 40+ dimensions) using a VT that distributes on the Dimension keys so the optimizer will duplicate  / PJ / Merge those dimensions before doing a row hash match . ALL stats that the optimizer wanted - it got. Even column combinations in the VT's - It got them, though I could see that some column combination were'nt going to change the explain. Admittingly some dimension PI are 40+ skewed and WITHOUT using DNPJ the report will SPOOL OUT in 15 mins ( with DNPJ it will do some  15+ more hash joins. Each of these steps churns out the same number of actual rows 
e.g. step 15 we do .hash ..... actual rows 1871234 
       step 16...........actual rows 1871234 
      step  40............actual rows 1871234 
it kind of makes me feel some of these joins are'nt doing any good and they  are ALL INTRA SPOOL JOINS 
 
. Over here they have'nt turned on step level logging so I have to take the run time explain ( view pt ) and then backtrack the spool. The step it spools out on is this last step and   11934 is 11861 + a badly skewed table
 

182823952.00

42.861526

We do an All-AMPs JOIN step from Spool 11932 (Last Use) by way of an all-rows scan, which is joined to Spool 11933. Spool 11932 and Spool 11933 are joined using a merge join . The result goes into Spool 11935, which is built locally on the AMPs. This step ends a parallel block of steps.

 

11.86962

We do an All-AMPs JOIN step from Spool 11934 (Last Use) by way of an all-rows scan, which is joined to Spool 11935. Spool 11934 and Spool 11935 are joined using asingle partition hash join . The result goes into Spool 11855, which is built locally on the AMPs.

So is the spool out which I presume is a local amp spool out  , being caused by the skew in the last badly skewed table OR there's something going on higher up.
If I change the PI of all 40+ skewed tables will this Query  run to completion using the originally intended plan with PJ's 
I dont follow wy the PJ optimizer plan spools out and the hash join plan ( takes lower spool and a lot higher Impact CPU ) will run to completion. What's going on under the covers .
TY
Sam
</p>

carrie 595 comments Joined 04/08
26 Oct 2015

Hi Sam,
 
My best guess about why you are spooling out is that you have a large skew issue somewhere in one of the spool files lower in the plan.   When there are large numbers of tables being joined in one query and subsequently lots of spool files, even if you have collected statistics thoroughly on the base tables that are accessed, by the time you get to the final joins inaccuracy in spool file estimates is possible and skew is difficult to detect.  You should check that statistics are current, and that large updates have not been performed since the relevant stats were last collected.
 
In the examples of skew detection by the optimizer that I show in the blog posting above, the high mode frequency values are taken from the statistic histogram and that's what allows the optimizer to improve the plan to avoid a large skewed redistribution.   Spool files do not have statistics collected on them and there is no histogram to represent them, so skew detection is more challenging at the bottom of a plan where you are joining spool files, not base tables.
 
I think you will need to find a way to get step level DBQL data and then go step by step in the plan and find out where the skew problem begins.   Or try to simplify the query.   You could also turn to Teradata Professional Services for query tuning expertise and see if they can assist  you.  If you have evidence to believe that the optimizer is producing a non-optimal plan, it's always a good idea to open an incident with the support center.  
 
Regards, -Carrie

You must sign in to leave a comment.