Subscribe to Blog content and comments for dnoeth Latest blog posts

The accuracy of sample statistics has been greatly improved over the last Teradata releases. So I usually try to use sample stats on most of the big tables and I found them to be reliable on many columns, not only the officially recommended unique or nearly-unique columns, e.g. on DATEs.

But there is a specific scenario when sample stats result in worst case optimizer plans:

Beginning with TD14 statistics are no longer stored in dbc.TVFields and dbc.Indexes, they have been moved into dbc.StatsTbl to facilitate several enhancements. A new view dbc.StatsV returns much of the information previously extracted in my StatsInfo query.

But of course this is still not enough information, at least not for me ;-)

To add or drop a column or modify the list of compressed values of an existing table is a quite expensive operation. For a large table it might result in a huge amount of CPU and IO usage and a loooooooong runtime. This blog discusses the pros and cons of the different ways to do it.

In a recent topic in the forums there was a question on "how to spell out numeric values in english" and an excerpt from the manuals indicating there is a format for the new TO_CHAR function in TD14: "Any numeric element followed by SP is spelled in English words."

When you start designing a statistics collection process topics include:

  • stats on some tables are more important than others.
  • some stats on a single table are more important than others.

Both problems involve some kind of prioritization process, which finally leads to a sorted list of COLLECT STATS statement (See Marcio Moura's blog When is the right time to refresh statistics?)

Processing this list (using a BTEQ script or a cursor in a Stored Procedure) could lead to another problem:

Collected statistics are a valuable source of information in Teradata, not only for the optimizer, but also for developers or DBAs. Spotting outdated statistics (which might lead to bad optimizer plans) and implementing/monitoring a stats recollection process are common tasks, which need the current statistics as input.

In my previous posts on missing functions i covered SQL:2008 DENSE_RANK and CUME_DIST & NTILE.
This time i'll talk about additional functions based on sorted sets of values, SQL:2008's PERCENTILE_DISC and PERCENTILE_CONT plus a MEDIAN calculation.

In my previous post i covered some ranking functions including SQL:2008 RANK, ROW_NUMBER & DENSE_RANK.

But Standard SQL defines three additional functions based on ranking:
PERCENT_RANK and CUME_DIST as relative rankings and a percentile calculation named NTILE.

Some of the SQL:2008 analytical functions are not implemented in Teradata, but most of them can be re-written using existing features.
In this post I will cover the ranking algorithms including DENSE_RANK.

The commonly known technique to pass parameters to SQL statements at the system or session level utilizes tables (permanent or global temporary tables).
In this blog post, I'll show better performing techniques for system and user level variables using views and QueryBanding.
These solutions are based on TD13, but can also be used pre-TD13, although performance might be worse.