Automated Demotions on the Appliance: What are Query Milestones?
If you are using Teradata on one of the 1xxx or 2xxx platforms, be aware that query demotions are automatically built into your workload management scheme. Because I’ve had a few questions about what these so-called "query milestones" are and how they actually work, I’d like to explain this functionality and discuss its trade-offs.
The appliance platforms are intended to be simple, easy to use, with few knobs to turn or decisions to make. When it comes to workload management, you have available 4 different priorities: Rush, High, Medium and Low, also known as R, H, M, and L. These groupings represent the four default priority scheduler performance groups.
All user work will fall into the Medium priority by default. Users and all the work they submit can be assigned to one of the other three performance groups by adding an abbreviation of the performance group name (preceded by the dollar sign) into the first position in the user’s account string (for example, $R, $H, $M, or $L).
Modifying a user’s account string to begin with the characters “$L” will cause that user’s queries to all execute in the Low performance group. Similarly, starting the account string with the characters “$R” will result in that user’s submitted work to run at the highest priority, Rush.
.jpg)
Three of the performance groups, Rush, High, and Medium, come with built-in query milestones, as illustrated in the graphic above. Query milestones keep track of CPU usage by query and potentially can move a query that starts in one priority to a lower priority automatically, based on consumption of CPU per node. The purpose behind a query milestone is to move work that is running at a high priority out of the high priority if the query proves to be longer running than is suitable for that priority. By keeping only short work in the higher priorities, you can better protect that high priority work so that it can run more consistently.
Query milestones are particularly useful if a single user submits a mix of short and long and very long queries. A query milestone will give all queries a chance to run at the prescribed priority for that user, and then after a certain amount of CPU usage will demote the queries that require a greater amount of CPU.
I want to make a quick diversion at this point and mention that in addition to having 4 priority differences on the Appliance platforms, system throttles and filters can be used to control concurrency and decide if requests qualify to be rejected. If you are on Teradata 13.10, the “common classification” enhancement allows you to associate specific throttles or filters to queries based on their individual query characteristics. For example, in Teradata 13.10 you can set up a throttle that will manage the concurrency of all queries that have an estimated processing time of greater than 10 seconds (or whatever time you wish), or that access a specific table or database. Teradata documentation will outline all the choices for you when it comes to scoping a filter or a throttle rule to suit your purposes.
Throttles in particular can provide a big advantage on the Appliance platform. By limiting concurrency on lower priority work, you can increase the level of resources available to the higher priority work, boosting its consistency and performance.
The same priority setup will be used night and day on an Appliance platform. The same is true of throttles and filters. Once you activate a throttle rule, it will remain active through all of your processing windows. You can, however, set up multiple different throttles to control different situations.
Now let’s get back on track with our main topic and take a closer look at how the query milestones work.
Priority scheduler lives within the database’s PDE level, closely linked to the operating system. Each node in an MPP system has its own instance of the operating system, and also has its own copy of the priority scheduler. Each priority scheduler copy is monitoring and controlling priorities only for the tasks that are running on its node.
When a query milestone is defined, it is given a threshold of CPU usage. This is done for you on the Appliance. When a query executes, its CPU consumption is checked against this threshold as it is running. When its CPU usage reaches that threshold on a node, all the tasks contributing to that request will be moved under the control of a lower priority group (on that node).
For example, when a request that begins in the Rush priority consumes 10 seconds of CPU on a node, it will be moved under the control of the Medium priority on that node. It could be that tasks on behalf of this request that are active on a different node have already been demoted, or it could be they won’t be demoted for some time, as might be the case if the processing is skewed. It is perfectly normal for a request on one node to reach a query milestone before the request running on another node. Each node is taking care of itself.
If you have experience with TASM, the query milestone functionality I am describing is similar to the exception type called “Tactical CPU Time per node”, which is only supported for workloads with a tactical enforcement priority. The TASM exception action for such an exception must be the change the workload (a demotion) and the threshold value must be under 5 seconds.
In TASM you have a second option for demoting queries known as “CPU Time Sum over all Nodes” which does not perform the demotion until all nodes combined reach a specified sum of CPU seconds.
Here's another way to look at the query milestones that come with the Appliance. Allocation groups are priority scheduler components that underlie each performance group and that carry the actual priority information, such as weight. The higher the allocation group weight, the higher its priority.
.jpg)
When the session submits a new request, the request-level CPU accumulation is zeroed out for the previous query and the new request will begin to accumulate CPU on each node independently of the CPU accumulated by the previous query in the session. A demotion lasts only for the life of the current request.
A given request can be demoted more than one time. If a request starts in Rush, it can be demoted to Medium after 10 CPU seconds have been consumed, and then it can be demoted a second time to Low after 200 CPU seconds on the same node. The accumulation of CPU in both cases is from when the query begin to execute.
The number of nodes in the configuration will influence the speed at which demotions take place in the life of a query. The larger the number of nodes in your configuration, the more CPU seconds are allowed to be consumed overall before a given request will begin to experience demotion.
As an example, let’s consider a small configuration and then a large configuration. If you have a small two-node Appliance, then a request would get to consume about 20 CPU seconds overall (10 CPU seconds per node * 2 nodes) before a demotion on one or both nodes will take place. On the other hand, if you have a 10-node configuration, the same request would get to consume about 100 CPU seconds overall (10 CPU seconds per node * 10 nodes) before the same request would begin to see demotions happening across nodes.
Changing the threshold of query milestones is not supported on the Appliance platforms. If you find the thresholds are not suitable for the type of work you are performing, or if reductions in priority are happening too quickly for you and you would prefer to disable the query milestone functionality, there is an operating system command that can be issued to turn off demotions. This is an all-or-nothing setting that will disable all defined query milestones, and it will require a restart.
To remove query milestones: schmon -G DEFAULTSETTINGS
To revert back to query milestones: schmon -G CLEAR
Remember that a restart will be required for this change to take effect. And you must have operating system root privileges to execute either of those commands. The commands must be submitted from the operating system prompt.
When considering whether you want to keep the milestones on or turn them off, ask yourself these questions:
- Do most of the queries that you wish to execute at the Rush (highest) and High (next highest) priorities complete using less than 10 seconds of CPU per node? If so, the query milestones will be helpful to you because they will get the other, longer, work out of the high priority grouping before they slow the short work down very much.
- Do most of the queries you expect to run in the Rush (highest) and High (next highest) priorities complete using more than 10 seconds of CPU per node? If so, they query milestone is providing less value, since if most work is demoted, the benefit of having different priorities is reduced.
- Do you expect most queries that start in Rush, High or Medium to require more than 200 CPU seconds per node? If so, then query milestones are serving no value, since most of the work will end up running in Low and you will have lost all priority differentiation.
Very nice article about it. Wish it had more information like what other levels of workload management is possible using the TD Appliance. I'm more interested in 2xxx platforms. First question to start with can operating periods be implemented? If so xschmon? how do I get it never used it just read about it in the utilities. Can you please help/share any guiding document for WD management and best practices. Last question what kind of questions should I be asking client/customer to properly configure the workload management in Appliance. In advance thanks much Carrie.
Thanks,
Vasu
Vasudev,
The appliance platforms are intended to be simple, easy to use, with few knobs to turn or decisions to make. For that reason, you cannot change priority settings using any of the varieties of the schmon commands. (You can, however, turn query milestones on and off as explained in the posting above. ) Operating periods cannot be implemented on the appliance platforms.
You can find a manual called Appliance Implementation Guide by clicking on “Hardware Platforms” and then “2650” from the Teradata Information Products web site. There is coverage in that document that describes what workload management is available, in Chapter 13, Task 4 and Task 5.
The Information Productions web site is
http://www.info.teradata.com/
Thanks, -Carrie
Its really cool, got better understanding now but has a question about removing milestones.
1. If query milestoens are removed does the queries get priority by first come basis?
2. Do we really need to remove them or just make all users to default account "$M".
we have 12 nodes and most of our queries really run longer (more than 1 hr) as they are analysis based, does that mean we have remove priority schduler?
If the query milestone is removed, then that means that no query will get it's priority diminished once it has started to run. All queries will run at their original priority until they complete. All queries will receive CPU based on the relative weight of their allocation group. Queries are assigned priorities based on their user. The query milestone does not change the assigned priority of a query as it begins to run, only after it has reached the threshold. Up until that point the query will run at the priority assigned to the user.
You do not need to turn off the query milestones if you are comfortable with what the demotion of queries that reach the milestones. The blog posting above describes how they function, but if you would prefer that your queries always run at the priority they were originally assigned, no matter how much CPU they consume, you can turn the milestones off.
Turning the milestones off is not the same thing as running all the queries in medium. If you ran all the queries in medium, then you would have no priority differences among them, even when they begin to execute. But with query milestones, the priority differences are preserved up until the point when the threshold of usage is reached, and that is determined query by query. At that point, the priority for that one query becomes smaller, but the priority of the other active queries does not.
Thanks, -Carrie
Thank you much Carrie. Can TDWM be used for CPU RP and other allocation groups or only PSA and command level schmon commands to play with CPU RP or AGs?
Thanks,
Vasu
Vasu,
You are not able to set or change CPU limits on the appliance platforms.
If you are on a platform with full TASM and were on 13.10, you would use Viewpoint Workload Designer to set CPU limits. If you have full TASM and are on a release earlier than 13.10 you could use TDWM. I you are an EDW platform and you don't have TAS you could use PSA or schmon commands.
Teradata documentation will tell you how to make CPU settings using any of those approaches.
Thanks, -Carrie
Hi Carrie,
Is changing a query Priority an available feature in an appliance? I've tried updating a $H query to $M in Viewpoint but it's not taking effect.
Thanks
You can only change the priority (by changing the account string) of a running query if you are NOT using TASM. On the appliance platform you are not using TASM, so it would seem to me that you should be able to do what you are trying to do.
Viewpoint supports making such a change in portlets like Query Monitor. What you do in Viewpoint will cause a PM/API SET SESSSION ACCOUNT API to be executed, and that also supports the ability to change priorities of a running query (if you are not using TASM).
If the change is not working for you, you can either manually review the viewpoint.log file and the DBC.sw_event_log table for more information, or engage support for assistance.
Thanks, -Carrie
Perfect, the best explanation I could ever come across:) Thanks Carrie..
You are very welcome! Thank you for the kind feedback. -Carrie
Hello Everyone...I just need to clear a few things about Workload. Recently, I read about Resource Partition, Performance Group, Performance Period and Allocation Group. From what I have understood, the Performance Group is decided by the User's Account String. And the Allocation group is decided by Performance Period (Milestones), with different Performance Periods alligning to different Allocation Group.
The hierarchy is: Resource Partition -> Performance Group -> Performance Period -> Allocation Group
My question is:
(a) How a query is allocated to a Resource Partition ? (My guess is since the Performance Group is implicitly decided by Account String, a set of Performance Periods constitutes a Resource Partition. Based upon Performance Period, the Resourse Partition is decided).
(b) Once within a Performance Group, based upon milestones, a query is moved between different Performance Period and hence, Allocation Group. My Question is which is the First Performance Period within a Performance Group that is allocated ? Is it something default ?
(c) In all these above scenarios, where does Workload comes into play. Is Workload equivalent to Performance Period ? Where exactly Workload fits in the above hierarchical diagram.
Thanks in advance.
A query is not allocated to a resource partition directly. It is mapped to a performance group. A query is under the control of an allocation group, based on which allocation group its performance group (PG) points to. The resource partition is a collection of performance groups. So whatever PG a query belongs to will dictate the resource partition it is associated with.
A performance period is simply an attribute of a performance group. It allows a query to map to a single performance group, but based on time or CPU usage to be moved under the control of a different allocation group than the default allocation group. The performance period has no independent existence outside a performance group. It is simply a bridge between a performance group and an allocation group. It is a carry-over from previous releases where it had a more important role. Today, you can basically ignore performance periods.
Most performance groups only have a single (default) performance period. For appliance platforms, the performance groups (most of them) have built-in performance periods in order to control the automatic demotions. The first PP is the default one where the query starts; the second PP is for the allocation group after the demotion. However, you don't have to be concerned about performance periods at all or how to define them. In fact, on the appliance, you will not be allowed to add or delete performance periods. If you are using full TASM, you don't need to worry about them either, as you have things like workload exceptions to provide similar functionality.
The TASM workload is a performance group under the covers (at least in the SLES 10 versions). TASM workloads do not know about or get involved in performance periods. So if you are a TASM user, you don't have to be concerned or ever think about performance periods. Just set up workload exceptions if you need them, using whatever exception criteria you like.
Thanks, -Carrie
Hi Carrie,
Am I understanding it correctly, that it is not possible to turn off milestones at the AG level? That there is no way to reprioritize an excuting query (reclassify in TASM terms)?
My use case is as follows:
a batch script is running late, doing a trunc and load (insert/select) to a critical table. The nature of the query is that is exceeds the milestone and steps down. Because there is other work on the system, I want this 1 query to be moved to R1 and stay there for just this one day.
Short of throttling all other work into the delay queue, is there nothing that can be done to increase this query's priority?
ruth
Ruth Fenwick
Hi Ruth,
I am assuming you are talking about an Appliance platform. On the Appliance you can issue a SET SESSION ACCOUNT to change the priority of a running query. It is one of the monitor commands, which are documented in Teradata documentation.
SET SESSION ACCOUNT allows you to change the performance group of the query, which means the query will begin CPU accumulations all over. However, the query is still at risk for being demoted when it reaches the new performance group’s query milestone threshold.
If query milestones are on, there is no way to keep a query running at a high priority while exempting it from ever being stepped down.
You also have the option of turning off the automatic query milestones if they are not working for you, but then they will be off for all priorities and for all queries.
Thanks, -Carrie
Carrie,
Great detail about the workload management capabilities on an Appliance. I figure it is not available but wanted to check with you to see if we could add a CPU cap on a AG on an appliance? We get some high PJI queries that take over the system and we end up having to abort them. I would like to be able to manually move them into a penalty box with CPU cap and let them complete if the process is business critical and we have not had time to tune the process.
Thanks
Phil
Hi Phil,
Sorry, CPU limits on allocation groups are not available on the appliance platform. The closest thing to that is the query milestones described in this posting.
Thanks, -Carrie