Using the ResUsageSAWT table to Monitor AMP Worker Tasks
As you approach full use of your AMP worker tasks, monitoring in-use counts becomes a more important exercise. The ResUsageSAWT (SAWT) table provides a wealth of information about AWT usage when you need to dive into details.
ResUsageSAWT provides you with several categories of information, including AMP worker task usage and high-water marks broken out by work type, as well as message queue depth/ flow control frequency metrics.
The SAWT table is organized by AMP, with one row for every AMP in the system for the logging interval date and time. Only AMP vprocs are reported; other vprocs are ignored.
AMP worker task in-use counts
The SAWT table carries AMP worker task in-use counts, as well as a maximum in-use count across the logging interval for each message work type on each AMP. The values for the in-use counts come from a snapshot taken at the end of the logging interval. Maximum in-use counts for each of the message types represents the maximum counts across the entire logging interval.
Work messages that arrive on the AMP are put into categories called work types. Work types suggest the importance of the work contained in the message. Work00 represents a new message from a user request coming from the dispatcher, Work01 represents spawned work from a user request, and so forth. Both of these work types are less important that the internal work types, such as Work12, which represents abort processing.
Each work type reports a maximum in-use count metric. Do not attempt to add these maximum in-use count fields across all work types on an AMP. If you do that, the sum will give an inflated number of AWTs for that AMP.
The inflated sum from adding in-use maximum counts occurs because maximum values are selected independently for each work type across the logging interval, and the maximum values for different work types do not represent counts that have been produced at the same point in time. Maximum in-use count fields are most useful for evaluation of individual work type behavior and are not well suited for producing combined work type totals.
The InuseMax field is better suited for that purpose (discussed below) as it represents the high water mark for combined AWT usage for that interval. Below is an example of how maximum in-use counts can give you an inflated total, if added together:
Message queue depth and flow control metrics
The second category of information captured by the SAWT table is message queue length and flow control detail. The following fields from the SAWT table directly relate to this category (as of Teradata 14.0):
- MailBoxDepth: This terminology is another way of referring to the message queue length. “Mailbox” and “message queue” are synonymous in this case. This is the current depth of the AMP work message mailbox at the end of the logging period.
- FlowControlled: This metric tells you if the AMP is in flow control at this point in time. It is a snapshot taken at the end of the logging interval. If it is zero, it indicates the AMP was not in flow control at that time. If it is non-zero, it means that the AMP was in flow control at interval end.
- FlowCtlCnt: This count tells you how many times during the collection interval the AMP went into the state of flow control. It does not tell you how long the flow control state persisted, only how often it took place.
- FlowCtlTime: This metric provides the total time in milliseconds that an AMP was in the state of flow control. This is a good indication of the degree of flow control experienced by the AMP, and is available in Teradata 14.0.
Flow control is an indication that the system is congested and that message are temporarily being sent back to their source. If flow control is very light and occasional and is not impacting response time expectations for any critical applications, then it may be appropriate to continue to monitor without making any changes. However, if is becoming more persistent over time, it should not be ignored and appropriate actions should be taken.
Available AMP worker tasks
Among the AMP worker tasks available to an AMP, some are held in reserve pools for certain work types. The remaining are held in an unassigned pool for general use.
Two new columns have been added to the ResUsageSAWT table in Teradata 14.0 that will give you a look at the size of the unassigned pool of AWTs: Available and AvailableMin.
- Available: The number of AMP worker tasks remaining in the unreserved pool (not being used) at the end of the logging interval.
- AvailableMin: The lowest Available number for the entire logging interval. If zero is reported, there were no AWTs available in the unassigned pool at some point.
Under normal, non-congested situations, if you sum up the in-use counts for WorkNew and Work01 and the value in Available, it should consistently total 62, which is the maximum number of AMP worker tasks that can be used to support user-initiated work of the Work00 and Work01 type (assuming the default settings).
Examining these two metrics can give you a feel for how many AMP worker tasks you have left for work arriving on the AMP at different times of the day, and thus how close you are to AMP worker task exhaustion at those times.
InuseMax is the only ResUsageSAWT metric that is self-consistent, that truly reflects the total usage of AWTs at a particular point in time. It can never exceed the number of AWTs defined on the system per AMP. It is also a useful metric because it reflects the maximum number of AWTs in use at one point in time across the logging interval.
If the logging interval is 600 seconds, InuseMax reflects the highest number of AWTs in use at any point in time during that 10 minute interval. For an ongoing metric to monitor and track AWT usage over time, InuseMax is your best choice.
When charting this field so that all AMPs on the platform are represented, consider using the maximum of each AMP’s InuseMax, rather than the average of each AMP’s InuseMax. The figure below shows just how large a difference there can be between the two approaches. When monitoring AMP worker tasks, the worst case is always more informative than an average case.
The blue color represents the maximum of the InuseMax values, and the red the average of the InuseMax values across a 24-hour period.
Thresholds for the InuseMax metric
InuseMax is the key metric to evaluate AWT in-use counts over time. Benefits of doing so include the following:
- A single metric to track
- No calculation or averaging required
- Reflects actual usage at the maximum level for the logging interval
- Incorporates all work types combined at the same point in time
On a system with the default of 80 AWTs and no reserve AMP worker tasks defined, an InuseMax of 62 is a good indicator of AMP worker task exhaustion. Consider the guidelines in this table.
It is better to manage the problem of AWT exhaustion when it is just beginning, rather than waiting until it turns into flow control. The ResUsageSAWT table contains valuable information that can help you achieve this goal.