Automated Demotions on the Appliance: What are Query Milestones?
If you are using Teradata on one of the 1xxx or 2xxx platforms, be aware that query demotions are automatically built into your workload management scheme. Because I’ve had a few questions about what these so-called "query milestones" are and how they actually work, I’d like to explain this functionality and discuss its trade-offs.
The appliance platforms are intended to be simple, easy to use, with few knobs to turn or decisions to make. When it comes to workload management, you have available 4 different priorities: Rush, High, Medium and Low, also known as R, H, M, and L. These groupings represent the four default priority scheduler performance groups.
All user work will fall into the Medium priority by default. Users and all the work they submit can be assigned to one of the other three performance groups by adding an abbreviation of the performance group name (preceded by the dollar sign) into the first position in the user’s account string (for example, $R, $H, $M, or $L).
Modifying a user’s account string to begin with the characters “$L” will cause that user’s queries to all execute in the Low performance group. Similarly, starting the account string with the characters “$R” will result in that user’s submitted work to run at the highest priority, Rush.
Three of the performance groups, Rush, High, and Medium, come with built-in query milestones, as illustrated in the graphic above. Query milestones keep track of CPU usage by query and potentially can move a query that starts in one priority to a lower priority automatically, based on consumption of CPU per node. The purpose behind a query milestone is to move work that is running at a high priority out of the high priority if the query proves to be longer running than is suitable for that priority. By keeping only short work in the higher priorities, you can better protect that high priority work so that it can run more consistently.
Query milestones are particularly useful if a single user submits a mix of short and long and very long queries. A query milestone will give all queries a chance to run at the prescribed priority for that user, and then after a certain amount of CPU usage will demote the queries that require a greater amount of CPU.
I want to make a quick diversion at this point and mention that in addition to having 4 priority differences on the Appliance platforms, system throttles and filters can be used to control concurrency and decide if requests qualify to be rejected. If you are on Teradata 13.10, the “common classification” enhancement allows you to associate specific throttles or filters to queries based on their individual query characteristics. For example, in Teradata 13.10 you can set up a throttle that will manage the concurrency of all queries that have an estimated processing time of greater than 10 seconds (or whatever time you wish), or that access a specific table or database. Teradata documentation will outline all the choices for you when it comes to scoping a filter or a throttle rule to suit your purposes.
Throttles in particular can provide a big advantage on the Appliance platform. By limiting concurrency on lower priority work, you can increase the level of resources available to the higher priority work, boosting its consistency and performance.
The same priority setup will be used night and day on an Appliance platform. The same is true of throttles and filters. Once you activate a throttle rule, it will remain active through all of your processing windows. You can, however, set up multiple different throttles to control different situations.
Now let’s get back on track with our main topic and take a closer look at how the query milestones work.
Priority scheduler lives within the database’s PDE level, closely linked to the operating system. Each node in an MPP system has its own instance of the operating system, and also has its own copy of the priority scheduler. Each priority scheduler copy is monitoring and controlling priorities only for the tasks that are running on its node.
When a query milestone is defined, it is given a threshold of CPU usage. This is done for you on the Appliance. When a query executes, its CPU consumption is checked against this threshold as it is running. When its CPU usage reaches that threshold on a node, all the tasks contributing to that request will be moved under the control of a lower priority group (on that node).
For example, when a request that begins in the Rush priority consumes 10 seconds of CPU on a node, it will be moved under the control of the Medium priority on that node. It could be that tasks on behalf of this request that are active on a different node have already been demoted, or it could be they won’t be demoted for some time, as might be the case if the processing is skewed. It is perfectly normal for a request on one node to reach a query milestone before the request running on another node. Each node is taking care of itself.
If you have experience with TASM, the query milestone functionality I am describing is similar to the exception type called “Tactical CPU Time per node”, which is only supported for workloads with a tactical enforcement priority. The TASM exception action for such an exception must be the change the workload (a demotion) and the threshold value must be under 5 seconds.
In TASM you have a second option for demoting queries known as “CPU Time Sum over all Nodes” which does not perform the demotion until all nodes combined reach a specified sum of CPU seconds.
Here's another way to look at the query milestones that come with the Appliance. Allocation groups are priority scheduler components that underlie each performance group and that carry the actual priority information, such as weight. The higher the allocation group weight, the higher its priority.
When the session submits a new request, the request-level CPU accumulation is zeroed out for the previous query and the new request will begin to accumulate CPU on each node independently of the CPU accumulated by the previous query in the session. A demotion lasts only for the life of the current request.
A given request can be demoted more than one time. If a request starts in Rush, it can be demoted to Medium after 10 CPU seconds have been consumed, and then it can be demoted a second time to Low after 200 CPU seconds on the same node. The accumulation of CPU in both cases is from when the query begin to execute.
The number of nodes in the configuration will influence the speed at which demotions take place in the life of a query. The larger the number of nodes in your configuration, the more CPU seconds are allowed to be consumed overall before a given request will begin to experience demotion.
As an example, let’s consider a small configuration and then a large configuration. If you have a small two-node Appliance, then a request would get to consume about 20 CPU seconds overall (10 CPU seconds per node * 2 nodes) before a demotion on one or both nodes will take place. On the other hand, if you have a 10-node configuration, the same request would get to consume about 100 CPU seconds overall (10 CPU seconds per node * 10 nodes) before the same request would begin to see demotions happening across nodes.
Changing the threshold of query milestones is not supported on the Appliance platforms. If you find the thresholds are not suitable for the type of work you are performing, or if reductions in priority are happening too quickly for you and you would prefer to disable the query milestone functionality, there is an operating system command that can be issued to turn off demotions. This is an all-or-nothing setting that will disable all defined query milestones, and it will require a restart.
To remove query milestones: schmon -G DEFAULTSETTINGS
To revert back to query milestones: schmon -G CLEAR
Remember that a restart will be required for this change to take effect. And you must have operating system root privileges to execute either of those commands. The commands must be submitted from the operating system prompt.
When considering whether you want to keep the milestones on or turn them off, ask yourself these questions:
- Do most of the queries that you wish to execute at the Rush (highest) and High (next highest) priorities complete using less than 10 seconds of CPU per node? If so, the query milestones will be helpful to you because they will get the other, longer, work out of the high priority grouping before they slow the short work down very much.
- Do most of the queries you expect to run in the Rush (highest) and High (next highest) priorities complete using more than 10 seconds of CPU per node? If so, they query milestone is providing less value, since if most work is demoted, the benefit of having different priorities is reduced.
- Do you expect most queries that start in Rush, High or Medium to require more than 200 CPU seconds per node? If so, then query milestones are serving no value, since most of the work will end up running in Low and you will have lost all priority differentiation.