Node Resources Portlet - Take 2
As part of the Viewpoint 15.00 release, the Viewpoint team built a brand new version of the Node Resources portlet. The primary purpose of this portlet continues to be to identify skew on a Teradata Database system. The original incarnation of this portlet required a fair amount of manual intervention in order to achieve this goal. The new version of this portlet includes a simpler user interface and a new algorithm to identify skewed resources (or “outliers”) automatically.
Since the Teradata Database is a massively parallel architecture, it’s important that all of the units of parallelism are performing approximately the same amount of work. If some of the nodes or VPROCs within the system are performing too much or too little work when compared with the system-wide average, this is called skew. When work for a specific query is skewed, the query isn’t taking full advantage of the power of the system, and therefore doesn’t complete as quickly as possible. When the work on nodes or VPROCs is skewed, this can affect the performance of the system and also reduce the effective capacity of the system.
There are three primary enhancements to the Node Resources portlet. The first is the use of a histogram to visually display the data distribution for a particular metric. The automatic calculation of “outliers” based upon the data distribution is the second improvement. The final significant change is the ability to analyze the data over a time range instead of just the last sample of data.
The visualization in the previous version of this portlet depicted a square for each node or VPROC on the system. For larger systems it was hard to see all the squares on a single screen, and this representation of the data didn’t really add much insight into the actual data for a particular metric. The new version of the portlet instead uses a histogram to plot the data for the selected metric. The histogram contains 20 buckets of equal size, and the height of each bar represents the number of nodes or VPROCs that fall into each bucket or range.
The red bars in the histogram represent the buckets that contain “outliers”, which are nodes or VPROCs that are significantly skewed. Outliers are calculated as resources that fall 1.5x above or below the interquartile range. This is a standard statistical analysis for finding outliers in a distributed data set. In this way, the portlet automatically calculates any nodes or VPROCs that are significantly skewed for the selected metric. For a system that is working in a reasonably parallel fashion it’s definitely possible that you won’t see any outliers in the histogram. If the histogram does show any outliers, you might want to investigate further to discover the cause of the skewing on your system.
The third significant change is the ability to analyze up to an hour’s worth of data while using this portlet. In Viewpoint 14.10 and earlier, the Node Resources portlet only reported data for the last sample period. This data typically represented the data for a minute or less of elapsed time on your system, which is too short a time period to reliably discover significant skewing issues on a system. The new version of the portlet lets you choose the last collection time as before, but also an aggregation of 5, 15, 30, or 60 minutes of data.
While viewing the main screen of the portlet, you can click on any of the bars in the histogram to drill down and view the data for just the nodes or VPROCs in that particular bucket. From the main screen you can also click the “Down” or “Outliers” bubbles to change the filter for the data grid so that only those particular resources are displayed. You can click on any of the rows in either of the data grids to drill down to a detail screen that displays all of the metrics for that particular node or VPROC. The detail screen is different for nodes, AMPs, PEs and other VPROC types so that only the applicable metrics for that particular resource are displayed.
This new version of Node Resources should make it much simpler to monitor and identify potential skewing issues across the nodes and VPROCs of your Teradata Database system.
Note that the Node Resources portlet only applies to Teradata DB systems whereas the Node Monitor portlet provides monitoring aspects for Aster or Hadoop system nodes.