As you approach full use of your AMP worker tasks, monitoring in-use counts becomes a more important exercise.  The ResUsageSAWT (SAWT) table provides a wealth of information about AWT usage when you need to dive into details.

ResUsageSAWT provides you with several categories of information, including AMP worker task usage and high-water marks broken out by work type, as well as message queue depth/ flow control frequency metrics. 

The SAWT table is organized by AMP, with one row for every AMP in the system for the logging interval date and time.  Only AMP vprocs are reported; other vprocs are ignored.

AMP worker task in-use counts

The SAWT table carries AMP worker task in-use counts, as well as a maximum in-use count across the logging interval for each message work type on each AMP.  The values for the in-use counts come from a snapshot taken at the end of the logging interval.  Maximum in-use counts for each of the message types represents the maximum counts across the entire logging interval.

Work messages that arrive on the AMP are put into categories called work types.  Work types suggest the importance of the work contained in the message.  Work00 represents a new message from a user request coming from the dispatcher, Work01 represents spawned work from a user request, and so forth. Both of these work types are less important that the internal work types, such as Work12, which represents abort processing.

Each work type reports a maximum in-use count metric.  Do not attempt to add these maximum in-use count fields across all work types on an AMP. If you do that, the sum will give an inflated number of AWTs for that AMP.

The inflated sum from adding in-use maximum counts occurs because maximum values are selected independently for each work type across the logging interval, and the maximum values for different work types do not represent counts that have been produced at the same point in time.  Maximum in-use count fields are most useful for evaluation of individual work type behavior and are not well suited for producing combined work type totals.

The InuseMax field is better suited for that purpose (discussed below) as it represents the high water mark for combined AWT usage for that interval.    Below is an example of how maximum in-use counts can give you an inflated total, if added together:

 

Message queue depth and flow control metrics

The second category of information captured by the SAWT table is message queue length and flow control detail.  The following fields from the SAWT table directly relate to this category (as of Teradata 14.0):

  • MailBoxDepth:  This terminology is another way of referring to the message queue length.  “Mailbox” and “message queue” are synonymous in this case.  This is the current depth of the AMP work message mailbox at the end of the logging period.
  • FlowControlled:  This metric tells you if the AMP is in flow control at this point in time. It is a snapshot taken at the end of the logging interval.  If it is zero, it indicates the AMP was not in flow control at that time.  If it is non-zero, it means that the AMP was in flow control at interval end. 
  • FlowCtlCnt:  This count tells you how many times during the collection interval the AMP went into the state of flow control.  It does not tell you how long the flow control state persisted, only how often it took place. 
  • FlowCtlTime:  This metric provides the total time in milliseconds that an AMP was in the state of flow control.   This is a good indication of the degree of flow control experienced by the AMP, and is available in Teradata 14.0. 

Flow control is an indication that the system is congested and that message are temporarily being sent back to their source.  If flow control is very light and occasional and is not impacting response time expectations for any critical applications, then it may be appropriate to continue to monitor without making any changes.  However, if is becoming more persistent over time, it should not be ignored and appropriate actions should be taken. 

Available AMP worker tasks

Among the AMP worker tasks available to an AMP, some are held in reserve pools for certain work types.  The remaining are held in an unassigned pool for general use.

Two new columns have been added to the ResUsageSAWT table in Teradata 14.0 that will give you a look at the size of the unassigned pool of AWTs:  Available and AvailableMin.  

  • Available:  The number of AMP worker tasks remaining in the unreserved pool (not being used) at the end of the logging interval.
  • AvailableMin:  The lowest Available number for the entire logging interval.  If zero is reported, there were no AWTs available in the unassigned pool at some point.

Under normal, non-congested situations, if you sum up the in-use counts for WorkNew and Work01 and the value in Available, it should consistently total 62, which is the maximum number of AMP worker tasks that can be used to support user-initiated work of the Work00 and Work01 type (assuming the default settings).

Examining these two metrics can give you a feel for how many AMP worker tasks you have left for work arriving on the AMP at different times of the day, and thus how close you are to AMP worker task exhaustion at those times.

InuseMax

InuseMax is the only ResUsageSAWT metric that is self-consistent, that truly reflects the total usage of AWTs at a particular point in time.  It can never exceed the number of AWTs defined on the system per AMP.  It is also a useful metric because it reflects the maximum number of AWTs in use at one point in time across the logging interval.  

If the logging interval is 600 seconds, InuseMax reflects the highest number of AWTs in use at any point in time during that 10 minute interval.  For an ongoing metric to monitor and track AWT usage over time, InuseMax is your best choice. 

When charting this field so that all AMPs on the platform are represented, consider using the maximum of each AMP’s InuseMax, rather than the average of each AMP’s InuseMax. The figure below shows just how large a difference there can be between the two approaches.  When monitoring AMP worker tasks, the worst case is always more informative than an average case.

The blue color represents the maximum of the InuseMax values, and the red the average of the InuseMax values across a 24-hour period.

Thresholds for the InuseMax metric

InuseMax is the key metric to evaluate AWT in-use counts over time.  Benefits of doing so include the following:

  • A single metric to track
  • No calculation or averaging required
  • Reflects actual usage at the maximum level for the logging interval
  • Incorporates all work types combined at the same point in time

On a system with the default of 80 AWTs and no reserve AMP worker tasks defined, an InuseMax of 62 is a good indicator of AMP worker task exhaustion. Consider the guidelines in this table.

It is better to manage the problem of AWT exhaustion when it is just beginning, rather than waiting until it turns into flow control.  The ResUsageSAWT table contains valuable information that can help you achieve this goal.

Discussion
Rahul 2 comments Joined 05/09
08 Feb 2013

Hi Carrie,

Thank you for this insightful blog.
Question:
If we SUM the FlowCtlTime over a given period of time, say one day, and find what percent of the total time of the day was the system in Flow Control, is that a good metric to know whether there is a cause of concern ? For example, if we are in flow control less than 5% of the total time, it is OK, vs. if greater than 5%, it is not. Is there any recommended threshold for the percent of the total time the system can be in Flow Control without causing concern ?
Thank you again.

carrie 385 comments Joined 04/08
12 Feb 2013

Rahul, this is a good question.

My perspective on flow control is pretty stricft: You want to avoid it. It is always a sign of congestion, and generally it should be a red flag for you to take action to manage down concurrency (or take other steps to change system settings) when it appears.

However, if it is very occasional, impacting only one or very few AMPs for a short period of time, there are conditions under which you may need to just put up with it, depending on what is running on the platform.

Flow control is safety net, so there are times it makes sense to rely on it. But it always has undesireable side effects, and like drinking too much on New Years Eve, it is not something you want to tolerate on an ongoing way.

In the new orange book (if you have access to it) titled "AMP Worker Tasks and ResUsage Monitoring for Teradata 14.0" I have a section (Section 12.3) called "Thresholds for Acting on Flow Control" on page 48. It attempts to answer your question.

The orange book assumes you are on 14.0, where flow control time metric is valid to use. I would not advise using that flow control time metric UNLESS you are on 14.0, as that metric had some growing pains in its first release out.

Basically, the orange book says if FlowCtlTime is zero on all AMP, there is no flow control problem, but you should look at MailBoxDepth to see you close you are to getting into flow control, and continue to monitor.

If flow control time if 0 to 100 ms on one or a few AMPs, then start thinking about reducing concurrency and continue to monitor.

If flow control is more than 100 ms on most AMPs, it is becoming more persistent and you should take action to reduce concurrency or call the support center for advice on changing system settings.

As you can see, I set the bar pretty low on tolerance for flow control . But that is because I believe you will be better off managing the problem before it becomes too disruptive. Believe me, you're platform is not running optimally while AMPs are in flow control, even 5% of the time.

Thanks, -Carrie

Duri83 7 comments Joined 01/12
19 Mar 2013

Hi Carrie,
I'd like to add my thoughts on the magical value 62. Please correct me if I am wrong, but I suppose WorkNew+Work01+Available would sum up to 62 only if you do not reserve any AWT for the tactical work and the System is under fair usage ( not too little, not too much :) ).
For the tactical reserves, we would need to decrease the number by ReservedAWTs*2+2, e.g. on our system, we reserve 1 AWT for the tactical and thus our WorkNew+Work01+Available has its peaks at 58.
But this peak is only reached when we use the system more or less evenly, otherwise (when the system is idle) we get up to 52, because of 3+3 reserved AWTs for WorkNew and Work01, which are not counted in available and would only come to our calculation as soon as they are actually used in WorkNew and Work01.
Thanks for the great blogs,
Yuri

carrie 385 comments Joined 04/08
19 Mar 2013

Yuri,
 
Your comments make sense to me. 
 
Thanks, -Carrie

27 Apr 2013

Hi carrie,
Is it recommended to change or increase number of AWTs.
And if yes then from where I can change it?

Regards,
Muhammad Fahad.

carrie 385 comments Joined 04/08
01 May 2013

Muhammad,
 
It is not generally recommended to increase the number of AMP worker tasks per AMP.   Usually it only makes sense if there are platform resources that cannot be used and when there is adequate memory to support more active tasks.   The usual approach is to call the support center and talk it over with them, then follow their advice.
 
Thanks, -Carrie

geethareddy 93 comments Joined 10/11
26 Jul 2013

Hi Carrie,

I have couple qns, please share your thoughts.
I went through "AMP Worker Tasks and ResUsage Monitoring Teradata 14.0" and I am trying to determine the number of unreserved AWTs on the system. We have 90 awts/amp (it was 80, but recently we modified MaxAWT parameter to add 10 more. And increased Reserved AWTs to 4). 
I am using the below formula as you mentioned,

(total AWTs MINUS total for standard reserve pools  MINUS total for expedited reserved pools)
90 - (3X8)-(4) = 62. So 62 is the number of unreserved AWTs. 

But in your example i can see you deducted the Reserved AWTs twice and you added 2. I am not sure if there is any reason in doing that. Can you please go through the below paragraph that i have extracted and pasting below:
And please help me to understand why substracted 10 awts 2nd time and added 2. And i am not proceeding with Step2 for my system as mentioned in the below calculation. Actually I ddint understand why you are adding back the WorkNew & WorkOne AWTs.

This step-by-step example assumes that there are 100 AWTs per AMP defined, and the

reserve count is specified as 10.

1. Determine the number of unreserved AWTs.

100 – (3 x 8) – (10 + 10 + 2) = 54 AWTs

(total AWTs minus total for standard reserve pools minus total for expedited reserved pools)

2. Add back in the reserves for WorkNew and WorkOne.

54 + (2 * 3) = 60 AWTs

(result from Step 1 plus 6 for the two standard reserve pools of 3 each)

3. Add in reserves for WorkEight + WorkNine.

60 + (10 + 10) = 80 AWTs

(result from Step 2 plus 20 for the 2 WorkEight and WorkNine reserve pools of 10 each)

Thanks,
Geeta

carrie 385 comments Joined 04/08
30 Jul 2013

In your formula, the 4 should be a 10 (total number of AWTs in the expedited reserve pool).   There are 3  reserve pools set up for expedited work:  Work08, Work09, Work10. You have to add up the AWTs held as reserves in all 3 pools.   See this blog posting for more detail on this formula:
 
How to Calculate Your Max Number of Usable AMP Worker Tasks
 
If you want to understand the total number of AWTs available for user-initiated work that is not expedited, then you can optionally add in the reserves for Work00 (AKA WorkNew) and Work01.  That gives you the total number of AWTs available to support user work at any point in time, which is often more useful information, because that is what users rely on having available.  
 
There are 3 different perspectives on available AWTs provided in that orange book, you can use whichever is most useful for you:
 
This first formula tells you total unreserved AWTs.
 
Determine the number of unreserved AWTs.
100 – (3 x 8) – (10 + 10 + 2) = 54 AWTs
(total AWTs minus total for standard reserve pools minus total for expedited reserved pools)
 
This second formula tells you total AWTs that can be used for user-initiated work that is not expedited.
 
2.  Add back in the reserves for WorkNew and WorkOne.
54 + (2 * 3)  = 60 AWTs
(result from Step 1 plus 6 for the two standard reserve pools of 3 each)
 
This third formula tells you total AWTs that can be used for user-initiated work that whether it is expedited or not.
 
3.  Add in reserves for WorkEight + WorkNine.
60 + (10 + 10) = 80 AWTs
(result from Step 2 plus 20 for the 2 WorkEight and WorkNine reserve pools of 10 each)
 
Thanks, -Carrie

vasudev 30 comments Joined 12/12
30 Oct 2013

Hi Carrie,
I am using TD 12, What is the significance of FlowControlled field in resusageawt table? I can see in Resource usage book, that if it is non-zero amp was flow controlled. I got the suggestion from DBA that because of Non zero values we are missing some SLA. DBA asked us to use concurrency throttles. Could you please show me some light on this. Or else by any other ways can we improve the SLA?
Thanls in advance. 

carrie 385 comments Joined 04/08
31 Oct 2013

Vasudev,
 
The flowcontrolled field is a snapshot of the state of the AMP (either it is in flow control, or not) at the end of the logging period.  It does not tell you if the AMP was in flow control once or many times within the logging interval.  Because the metric only reflects a single sampling, it can easily miss flow control that is happening within the logging interval, but just not at the end of the logging interval.  Many flow control episodes are short-lived, just a few milliseconds.  
 
Flow control is the state that the AMP enters when it has exhausted all of its AMP worker tasks, and the queue of messages that are waiting for an available AMP worker task has reached a maximum length.   When an AMP is in flow control all work messages for that AMP are sent back to their source and retried.  Flow control is an indication of congestion on the AMP.
 
So if you are seeing the flowcontrolled metric report positive for flow control, you can assume that the AMP was at least in the flow control state some of the time, maybe even all of the time.  I believe that is an indication that the AMP is be, for that interval.  It is an indication that the AMP is being overworked.   The flow control condition should be managed.  Throttles are a good way to do that because they limit the level of work that is entering the database and take the pressure off of AMP worker tasks.    Reducing concurrency often helps high priority queries that are missing their SLAs perform better.
 
For more information about flow control read my blog posting
 
http://developer.teradata.com/blog/carrie/2009/11/controlling-the-flow-of-work-in-teradata
 
Iinformation in that posting explains why being in flow control is usually an indication of the AMP being congested, and why it can cause short queries to miss their SLA.
 
Thanks, - Carrie

vasudev 30 comments Joined 12/12
01 Nov 2013

Thank you very MUch Carrie. Got Clarified now.....

You must sign in to leave a comment.