Teradata Mainframe Client Utilities update
Teradata has completed a major compiler conversion effort that should be completely transparent to customers, yet they are the main beneficiaries. This article:
- Provides some historical background and context,
- Discusses the reasons we switched compilers,
- Identifies certain behavioral changes that were unavoidable,
- And, finally, answers a few technical questions related to the overall process.
The venerable IBM mainframe was the original client platform developed for the Teradata RDBMS via channel-attached connections way back in the 1980s. Although we now support a wide range of client platforms using network-attached connections, a significant segment of our customer base continues to use mainframe clients.
Over the years we have endeavored to improve and enhance the reliability, performance, and security of our mainframe-based products. Aside from having a close working relationship with IBM (we are a member of IBM’s PartnerWorld), we are fortunate to count among our staff a number of veteran development and support engineers with many years of experience in working with z/OS, IBM’s flagship operating system for its zSeries family of hardware, and its predecessors.
We recently completed a major conversion that will afford customers greater flexibility, potentially superior performance, and substantially better support. What follows is a brief overview of what we did, why we did it, and how it will benefit our customers today and in the future.
In the beginning…
Most of the Teradata client and server components were written using the Pascal programming language. Pascal was a very elegant and powerful language that allowed complex algorithms to be expressed in a tight, compact, and robust fashion. Yet, Pascal never really advanced beyond the academic setting in which it was conceived. It was a great learning tool and demonstration language but it never managed to become widely adopted. In effect, it was an evolutionary dead-end.
Realizing that Pascal could well limit Teradata to certain platforms and might not remain current with respect to emerging technologies, we began to look at alternatives. The most attractive candidate was an increasingly popular language that could run on almost any platform and that was gaining phenomenal acceptance in fields as diverse as academia, engineering, mathematics, statistical modeling, and – perhaps most importantly – business applications. That language was C.
C on the mainframe
There were just a few choices of C compilers for the mainframe. IBM offered one. So did SAS Institute. The SAS product was called SAS/C and was substantially derived from the multi-platform Lattice C compiler, which was also used at one time by Microsoft (full disclosure: IBM, SAS, and Microsoft are all Teradata Partners). There was one major difference between the IBM and SAS products, however, and that difference centered on their respective run-time libraries (RTL).
Historically, the guts of most high-level languages (HLL) were concentrated in the RTL. Here could be found functions and subroutines that handled multiple kinds of I/O, inter-process communication, sundry other system interfaces, etc. The RTL for SAS/C was freely redistributable. This meant that vendors like Teradata could create programs using SAS/C and ship them to customers along with the RTL. In other words, customers did not require a license ($$$$) for SAS/C at all. Conversely, programs built using the IBM compiler were completely unusable unless the customer had also licensed the IBM compiler.
This constituted a major drawback for the IBM compiler from our perspective. Customers would have to purchase a license for the compiler to gain access to the RTL even if they had no plans to ever use the compiler for development purposes. Ultimately, it was a deal killer. (IBM rectified this situation with the subsequent introduction of Language Environment (LE), a common set of libraries used by most IBM HLLs and now an integral part of z/OS itself).
Teradata and SAS/C
And so began a long and fruitful association between Teradata and SAS/C. Over a period of time, we converted most of our Pascal-based client products to use SAS/C. This transition had the added benefit of allowing us to leverage our code base for the variety of platforms into which we had started to branch out. C was the most universal of available programming languages at the time and we took maximum advantage of those capabilities as well as certain SAS/C extensions such as augmented standard I/O, low-level I/O, multi-tasking and other low-level system interfaces, dynamic loading, and the inline machine code interface.
SAS decides to exit the compiler business
Several years ago, SAS Institute made a strategic decision to exit the compiler business in order to concentrate on its core product offerings in the realms of data mining, business intelligence, and analytics. The last release of SAS/C, 7.50, occurred in 2004. Not long after that, SAS classified the compiler as a legacy product. This meant no new functionality would be implemented. While SAS would continue to offer maintenance and support, it would be of a significantly limited nature.
Given the above constraints, we realized that it was imperative that we investigate other options and acquire a suitable replacement product that would allow us to continue to provide our customers with access to the latest technology and best-in-class support.
Teradata Parallel Transporter and z/OS XL C/C++
The most obvious candidate was IBM’s XL C/C++. IBM had made great strides in POSIX compliance, compile- and run-time performance, and overall usability since we had last looked at it. Indeed, we liked what we saw and selected the IBM product to build the new Teradata Parallel Transporter (TPT) on z/OS. Written in both C and C++, TPT provides an object-oriented flexible infrastructure that combines operators, filters, and access modules, written by Teradata, our partners, and/or users. TPT demonstrated that XL C/C++ was fully capable of meeting virtually any challenge.
Buoyed by the success of TPT, we immediately started to plan for the conversion of our remaining products from SAS/C to XL C/C++.
Identifying the challenges
We quickly figured out that selecting a compiler for a new product was a very different game from converting existing products from one compiler to another. In theory, the changeover should have been seamless and largely transparent. After all, C and C++ are mostly governed by well-established standards that all compiler vendors adhere to regardless of platform or provenance.
Earlier it was mentioned that we had long availed ourselves of certain SAS/C extensions. These extensions made life much easier for our developers and allowed us to harness the power and features of z/OS without the necessity of relying on internally developed and supported subroutines. While the IBM compiler offered some analogous capabilities, it was clear that there was not going to be a one-to-one correspondence. This meant we would have to supply the missing functionality using assembler-level calls. Much time and resources were devoted to this particular pursuit.
Two issues illustrate the challenges involved in the conversion:
- SAS/C and XL C/C++ differ in the padding character they use to complete fixed-length records that are smaller than the logical record length (LRECL) of the target file (yes, we’ve seen it done). SAS/C uses blanks (EBCDIC x’40’) while XL C/C++ uses nulls (EBCDIC x’00’). To maintain compatibility, our Data Connector was modified to recognize this scenario and replace the nulls with blanks.
- SAS/C allows empty (i.e., zero-length) variable-length records to be written as well as read. XL C/C++ only allows such records to be read – but never written. Rather than incur the expense and overhead of writing our own low-level I/O subroutine just to handle this rather esoteric limitation imposed by the RTL, we decided to write single-byte records containing a blank value instead.
A phased deployment
Because of the intricacies involved in some of the product conversions, scheduling considerations prevented us from introducing them in a single release. Rather, some would debut in TTU 13.10 while others would wait until TTU 14.00.
A few legacy products such as TS/API will never be converted. TDP and CLIv2 are written in assembler language and there are no plans to change that. And we have always built ICU using XL C/C++.
- Data Connector (internal)
- MQSeries Access Module
- Named Pipes Access Module
Part of the beauty of the conversion effort was that customers would be largely oblivious to it. Ideally, there would be no need for any JCL or script changes. Inmods, outmods, access modules (AXSMODs), and notify exits would continue to work exactly as before. In general, this proved to be the case. However, it is always difficult to anticipate the “creative” ways some of our customers were using our products in the field and some issues did emerge over time.
Some questions and comments from customers (along with some excruciatingly detailed responses)
- I have a COBOL program that makes a one-time call to FastExport. It no longer works and fails with
abend U4093 RC=000000AC. What happened?
Both Enterprise COBOL and XL C/C++ are LE products. LE uses a concept called enclaves to manage resources among multiple programs. LE prohibits POSIX(ON) programs from being called in a nested enclave situation (all of our programs run with POSIX(ON) because of our need to utilize the sleep and alarm RTL functions, among others). The workaround is to invoke FastExport from a subtask that is effectively outside the enclave. This can be accomplished using a short assembler stub program that our GSC will supply upon request.
- ARCMAIN has always needed to reside in an APF-authorized library. Is that still the case?
Yes, that requirement has not changed. While ARCMAIN itself does not make use of any authorized z/OS system services, the proprietary high-efficiency protocol it relies on to communicate with the Teradata RDBMS is not intended to be accessible to vanilla applications. APF-authorization provides an appropriate level of protection. Were we to do this today, a RACF resource rule would probably be a superior approach (in fact, that’s actually a pretty good enhancement suggestion).
- Will you continue to ship the SAS/C RTL?
Yes, as long as we have products that are built using SAS/C (some of our products will not be converted). We have no plans to terminate our SAS/C license.
- Will inmods, outmods, AXSMODs, and notify exits written in SAS/C continue to be supported?
Yes. Customer-written user exits or interfaces written in SAS/C will continue to operate as before. However, we would encourage customers to consider converting those programs for much the same reason we converted ours.
- I encountered the following error from FastExport after upgrading to TTU 13.10: “EDC5057I The open mode string was invalid.” The affected DD statement looked like this:
//OUTFILE DD DSN=PS1139.TERADATA.CARDS.DATE.MONTH26.K, // DISP=(NEW,CATLG,DELETE), // DCB=(RECFM=FB,LRECL=20),SPACE=(20,(1,1),RLSE), // AVGREC=K // DD DUMMY
It is not entirely clear what this customer is doing, but concatenated output makes no sense. SAS/C may not have complained (or just ignored the DUMMY concatenated file), but XL C/C++ correctly failed the open attempt. This is probably one of those rare cases where you are just going to have to change the JCL.
- Can I expect to see performance improvements as a result of the conversion?
Possibly. Prior to the conversion, we used a couple of different mechanisms under the rubric of our internal Data Connector component to squeeze out every last measure of performance, particularly throughput performance. For import operations (e.g., FastLoad, MultiLoad, etc.), we relied on low-level assembler language subroutines and for export operations (e.g., FastExport), we adapted the SAS/C Direct BSAM Interface. For general purpose access, we used SAS/C augmented standard I/O. The case for using low-level I/O is not nearly as strong as it was many years ago because of the presence of Media Manager, a little known z/OS component through which most I/O is now funneled for optimization purposes.
As part of the conversion process, we revamped Data Connector to use the native XL C/C++ RTL record-level I/O functions. Not only did this considerably simplify I/O management for us internally, it allowed us to fully support a variety of files that we previously had not been able to handle (in whole or in part):
- VSAM data sets
- Striped data sets
- Large format sequential data sets
- Extended Addressing Volume (EAV) data sets
(Note that ARC continues to use its own low-level assembler BSAM interface called ARCIO).
Some customers have reported shorter elapsed run times after upgrading. Again, while there are many variables that contribute to the overall performance characteristics of a given unit of work, it is possible that the conversion auspiciously tweaked some of them. Additionally, the XL C/C++ compiler uses a variety of tricks and techniques to shorten code paths. It also leaves clues for the dispatcher that enable out of order execution and sophisticated branch prediction methodologies which contribute to processor optimization by reducing waits and ensuring that parallel instruction pipelines are always fully utilized, resulting in minimal false look-ahead occurrences.
- Can I continue to use BUFNO and/or NCP specifications in DCB parameters?
Yes. However, you should be aware that they may be interpreted differently and their efficacy is highly dependent on the individual application and the characteristics of the data itself. XL C/C++ provides support for overlapped I/O (aka multiple buffering) using these parameters. But overlapped I/O is not recommended in all cases. For example, it is specifically contraindicated in situations where a large number of file repositions (or seeks) are likely to occur because the RTL may be forced to discard and/or restock buffers under such circumstances and this can cause additional I/O operations to be unnecessarily performed. This scenario can be clearly observed in the DBCLOG file of ARC which uses repositioning extensively; specifying a BUFNO or NCP greater than 1 will result in higher EXCP counts and a concomitant increase in overhead.
- I noticed that FASTLOAD (or any other Teradata client product) shows up as a z/OS UNIX process. Why? Has this always been the case?
In z/OS, the first time a unit of work invokes a POSIX function, it is “dubbed” as a z/OS UNIX process – regardless of whether it is executing as a batch job step, TSO command, or z/OS UNIX command (the latter of which we do not explicitly support in any case). Since all of our products are built using the POSIX(ON) runtime option they are automatically dubbed as z/OS UNIX processes at initialization. In RACF terms, this requires that an OMVS segment be attached to the user profile. Without such a segment the dub attempt will fail. Among other items, the segment contains the UID to be associated with a process. For integrity purposes, IBM strongly recommends that each userid be paired with a unique UID; however, if that is not practical, a default OMVS segment can be used instead. Even prior to the conversion, some client products showed up as z/OS UNIX processes. Here is what was happening in the case of BTEQ:
BTEQ calls the isatty SAS/C library function. As part of its initialization process, BTEQ uses this function to test whether a file associated with a given file number is an interactive terminal.
A statically linked SAS/C component invokes the querydub callable service (BPX1QDB). The querydub callable service obtains the dub status information for the current task. The status information indicates whether the current task has already been dubbed, is ready to be dubbed, or cannot be dubbed as a process (or thread).
If the querydub indicates that the current task has already been dubbed or is ready to be dubbed, a statically linked SAS/C component then invokes the sysconf callable service (BPX1SYC). The sysconf callable service gets the value of a configurable system variable (OPEN_MAX, in this particular case, to determine the maximum number of files that a single process can have open at one time).
Calling the sysconf callable service results in the current task being dubbed.
And, that is why BTEQ jobs run as a UNIX address space under z/OS.
This phenomenon occurs in BTEQ 08.00.00.00 and later. It is a function of the level of the SAS/C compiler used to build BTEQ (SAS/C 7.00C and above, specifically). It is not a bug in that it has no effect on the outcome of a given BTEQ job; it is merely a quirk or curiosity.
It should be noted that this will be observed if and only if the task representing BTEQ is capable of being dubbed. This generally requires that an appropriate security environment be in place. Either:
- The userid under which BTEQ was invoked must possess a valid UID (implies the presence of a valid OMVS segment for the associated userid).
- Or, a default OMVS segment exists for a default user and group, and that user or user/group combination is defined to the BPX.DEFAULT.USER facility class.
Teradata is committed to providing the very best client tools and utilities to its customers, whether developed directly by Teradata or via our valued partners. We also remain dedicated to supporting the various platforms our customers use to interact with the Teradata RDBMS as well as embracing the latest technology in order to realize that goal.
Long live the mainframe.