In-database analytics with TeradataR
R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. This free package is designed to allow users of R to interact with a Teradata database. Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.
You can download the latest (version 1.0.1) teradataR package here.
What is R?
R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible.
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes
- an effective data handling and storage facility,
- a suite of operators for calculations on arrays, in particular matrices,
- a large, coherent, integrated collection of intermediate tools for data analysis,
- graphical facilities for data analysis and display either on-screen or on hardcopy, and
- a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
R is a flexible programmable language that allows users to add functionality by defining new functions. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.
R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and over 1,200 available through CRAN (Comprehensive R Archive Network) mirror sites.
R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.
The Teradata add-on package for R
teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata. This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions. In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R.
- 20 Functions to enable R infrastructure to operate with Teradata
- tdConnect - Connect to Teradata via ODBC
- Td.data.frame - Establish data frame connections to a Teradata table
- 44 in-database analytical functions callable from R. Sample of the functions include:
- Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
- Reorganization functions: join, merge and samples
- Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
- K-Means clustering and Score K-Means
- Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
- R language features nrow, ncol, min, max, summary, as.dataframe, and dim
- Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
- Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
- Analytic process to create new derived predictive variables can be captured as a stored procedure.
- Entire process to create or update an analytical data set can be captured as a stored procedure.
- R function can list all the stored procedures within Teradata.
- R function can call a stored procedure that runs in-database
TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:
- Eliminate data movement from Teradata to the R framework for key data intensive tasks.
- Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
- Ability to operate within the R console environment.
- Embed your frequently performed tasks to run in-database.
- R and TeradataR are free downloads.
Getting Started with R and the teradataR package
Please refer to the teradataR 1.0.1 User Guide, included with the latest download, for more information about Getting Started with R and the teradataR Package.
Note that teradataR is a free package. For community support, please visit the Analytics forum.