Teradata TPump dynamic buffer filling
TPump has been enhanced to dynamically determine the
PACK factor and fill up data buffer if there is variable-length data. This feature is available in Teradata TPump 13.00.00.009, 13.10.00.007, 14.00.00.000 and higher releases.
TPump’s behavior prior to this feature
TPump utilized the "defined" row size, rather that the "actual" row size, to determine how many rows would fit into its data buffer. For example, a
VARCHAR column of
35000 bytes will be assigned a pack factor of
27 which is determined by up front test, even if 99.9% of the incoming rows have only 50 bytes of data in this
VARCHAR(35000) column! Such an up-front testing approach makes sense only if there are no variable-length fields in the input data; it was highly inefficient with variable-length fields.
TPump’s current behavior with this feature implemented
TPump will now dynamically determine the optimal PACK factor for input data with variable-length fields in Array Support. The user sets the
PACKMAXIMUM option or explicitly defines
PACK 2430, and TPump will then fill up to that or until the buffer is full on a request-by-request basis. Doing so will not cause problem in the statement cache; it is the PA (Parameter Array) that receives the most performance benefit from the higher PACK factor. Similarly, for NOPI, TPump will benefit from the higher PACK factor. The optimal PACK factor is established by the following dynamics, with the restriction that the total bytes not exceeding 1MB.
- Actual size of data rows
- Size of the multi-statement request (doubled if the client session character set is UTF16)
- Extra Teradata CLIv2 overhead for jobs that use TPump Array Support
The PACK factor could be floating; TPump will inform the user of the "floating" PACK factor via the following new TPump message
**** 12:30:44 UTY6679 WARNING: PACK factor has changed. The minimum PACK factor is <n> data records per request. The maximum PACK factor is <m> data records per request.
For example, take a target table defined as:
CREATE MULTISET TABLE testtbl, FALLBACK ( c1 integer, c2 varchar(4), c3 decimal(10,2), c4 integer, c5 varchar(500), c6 varchar(4000) ) NO PRIMARY INDEX;
For such an overall
varchar(4525) column, the PACK factor is
230 if the defined row size length is used.
TPump will trigger dynamic buffer filling feature when
PACKMAXIMUM is set or
PACK 2430 is explicitly defined with Array Support turning on, sample “
BEGIN LOAD” command is listed as below:
.BEGIN LOAD SESSIONS 4 1 ERRORTABLE <my_error_table> PACKMAXIMUM /* or PACK 2430 */ ARRAYSUPPORT ON ;
Here is the layout to load data into the target table using
.LAYOUT LAY1A ; .FIELD c1 * varchar(4) ; .FIELD c2 * varchar(4) ; .FIELD c3 * varchar(13) ; .FIELD c4 * varchar(4) ; .FIELD c5 * varchar(500) ; .FIELD c6 * varchar(4000) ;
In TPump output, a UTY6679 message will be displayed telling the user the “floating” PACK factor:
**** 15:58:36 UTY6679 WARNING: PACK factor has changed. The minimum PACK factor is 471 data records per request. The maximum PACK factor is 2430 data records per request.
The following TPump performance is assessed based upon the case of loading
73072 data rows:
- PACK factor =
230(TPump PACK factor is fixed based on defined data length)
Elapsed time: 00:00:00:23(dd:hh:mm:ss)
CPU time: 12.6875 Seconds
- PACK factor =
20(TPump default PACK factor)
Elapsed time: 00:00:02:16(dd:hh:mm:ss)
CPU time: 8.5625 Seconds
- Floating PACK factor ranged from
Elapsed time: 00:00:00:13(dd:hh:mm:ss)
CPU time: 12.0781 Seconds
TPump does a better job determining the optimal PACK factor and it runs faster with dynamically allocating data buffer feature. TPump with dynamically filling data buffer feature is almost 1.7 times faster in term of elapsed time than TPump with a PACK factor defined by data length; it is almost 10 times faster in term of elapsed time than a default 20 PACK factor is used. Total data TPump sends per elapsed second (MB/sec) is much improved.