Please note, we are no longer supporting teradataR since the decision was made for Teradata to focus on our partnership with Revolution for R integration with Teradata.

R  is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. This free package is designed to allow users of R to interact with a Teradata database.  Users can use many statistical functions directly against the Teradata system without having to extract the data into memory.

You can download the latest (version 1.0.1) teradataR package here.

Update: The source for TeradataR has been approved for distribution to the public. The source as well as an updated TeradataR package (works with R 3.0) is available from https://github.com/Teradata/teradataR.

What is R?

R is an open source language for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering) and graphical techniques, and is highly extensible. 

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

R is a flexible programmable language that allows users to add functionality by defining new functions. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

R can be extended (easily) via packages. There are about eight packages supplied with the R distribution and over 1,200 available through CRAN (Comprehensive R Archive Network) mirror sites.

R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

The Teradata add-on package for R

teradataR is a package or library that allows R users to easily connect to Teradata, establish data frames (R data formats) to Teradata and to call in-database analytic functions within Teradata.  This allows R users to work within their R console environment while leveraging the in-database functions developed with Teradata Warehouse Miner. This package provides 44 different analytical functions and an additional 20 data connection and R infrastructure functions.  In addition, we’ve added a function that will list the stored procedures within Teradata provide the capability to call functions from R. 

  • 20 Functions to enable R infrastructure to operate with Teradata
  • tdConnect - Connect to Teradata via ODBC
  • Td.data.frame - Establish data frame connections to a Teradata table
  • 44 in-database analytical functions callable from R.  Sample of the functions include:
  • Descriptive statistics: Overlap, histogram, frequency, statistics, matrix functions, and values analysis
  • Reorganization functions: join, merge and samples
  • Transformations: bincode, recode, rescale, sigmoid, zscore and null replacement
  • K-Means clustering and Score K-Means
  • Statistical tests: ks, dagostino.pearson, shapiro.wilk, bionomial, and wilcoxon
  • R language features nrow, ncol, min, max, summary, as.dataframe, and dim
  • Tool and R functions that allow users to create their own custom analytic functions that’s callable by R.
  • Teradata Warehouse Miner can capture any analytic stream including UDFs and create a stored procedure
    • Analytic process to create new derived predictive variables can be captured as a stored procedure.
    • Entire process to create or update an analytical data set can be captured as a stored procedure.
    • R function can list all the stored procedures within Teradata.
    • R function can call a stored procedure that runs in-database

TeradataR allows R users to leverage all the benefits of in-database processing with Teradata:

  • Eliminate data movement from Teradata to the R framework for key data intensive tasks.
  • Leverage the speed of Teradata database’s parallel processing to run analytics against big data.
  • Ability to operate within the R console environment.
  • Embed your frequently performed tasks to run in-database.
  • R and TeradataR are free downloads.

Getting Started with R and the teradataR package

Please refer to the teradataR 1.0.1 User Guide, included with the latest download, for more information about Getting Started with R and the teradataR Package.

Note that teradataR is a free package.  For community support, please visit the Analytics forum.

Discussion
andeek 1 comment Joined 11/10
02 Nov 2010

This is very exciting, however I don't see a lot of documentation surrounding the package. Am I missing a link to the reference manual? Thanks!

nachums 1 comment Joined 11/10
05 Nov 2010

Though the html files for the docs are not in their usual place, ?commandName still works.

DiEgoR 2 comments Joined 08/06
13 Dec 2010

I welcome the R interface to Teradata too! Whereas Teradata Miner is a helpful tool for standard tasks and SAS can do a few tricks with fastload/exp too, I expect R to provide an elegant environment for programming around Teradata stored data.

input output putput

toddb 4 comments Joined 10/10
12 Jan 2011

The teradataR package uses R documentation. To access the help files you must be within your R console and after loading the teradataR library you can then do help(teradataR). This will give you the help index and allow you to see the help for the whole package.

Jonathan 3 comments Joined 08/10
08 Mar 2011

I am wondering if we can move statistic analysis into Teradata as build-in UDF function.

Innovation provides performance

jtschei 1 comment Joined 02/10
17 Mar 2011

On a fresh install on new Linux server we had problems getting RODBC to connect to Teradata.

Resolved with these steps:

1) Download the source to RODBC and unpack

2) Build the package

R CMD build RODBC

3) Install the package

R CMD INSTALL ./RODBC_1.3-2.tar.gz --configure-args="--with-odbc-include=/opt/teradata/client/ODBC_64/include --with-odbc-lib=/opt/teradata/client/ODBC_64/lib/"

Bunichi 1 comment Joined 02/11
04 Jul 2011

About as.td.data.frame (Coerce to a td data frame )
I would want to undersatnd about TD data_frame more,
-is makes some definition wihtin TD Database, or not?
-Or it makes some temporary table or View?

The help of as.td.data.frame shows only as below
Coerce to a td data frame
Description
Coerce a data frame into a td data frame

Usage
as.td.data.frame(x)

Arguments
x data frame.
-Best Regards

murgai 2 comments Joined 08/11
18 Aug 2011

Does as.ta.data.frame() do a row by row transfer or does it invoke bulkload/fastload? For a small dataset (120MB) it takes really long. I am wondering if is doing a row by row insert. That was a major hassle with RODBC package.

toddb 4 comments Joined 10/10
08 Sep 2011

The current teradataR package does not use bulkload/fastload abilities. It currently just uses the RODBC package to do the work so row by row. A future ability will be extracting and loading that is tied to Teradata via the utilities or an API.

murgai 2 comments Joined 08/11
08 Jun 2012

I wonder if anything has been updated since nov 2010. I really hope the data pipe between R and Teradata has moved beyond ODBC to something similar to SAS<->Teradata

toddb 4 comments Joined 10/10
13 Jun 2012

Nov 2010 was release 1.0. There were several additions added in 1.0.1 which was released near the end of 2011. The ability to create your own variables in your td.data.frame, the subset command, and td.tapply are a few examples of new things added. The data transfer between R and Teradata still uses ODBC so that has not changed in version 1.0.1.

toddb 4 comments Joined 10/10
26 Jun 2012

@murgai something else you can use now is a JDBC connection. The data transfer into R is significantly faster with JDBC.

mdmanders 1 comment Joined 07/12
03 Jul 2012

Where are things heading with TeradataR and load/extract functinoality from/to R? I would love to have a fast interface to dump data.frames from R into Teradata. ODBC works pretty well when it comes to pulling data, but super slow on insert/create.

Thanks!
Mike

Tuen 20 comments Joined 07/05
30 Nov 2012

Are there any Orange Books around using R with Teradata? Not real familar with R so looking for some good start-up time usage

TGooch44 1 comment Joined 09/10
13 Mar 2013

Any planned updates to this package?

ulrich 36 comments Joined 09/09
24 Apr 2013

R 3.0.0 would require a new version. Are there any plance to recompile it?

Der Beginn aller Wissenschaften ist das Erstaunen, dass die Dinge sind, wie sie sind.-Aristoteles

richardtelstra 1 comment Joined 06/12
07 Oct 2013

second that comment above about a new version for users of R3.x

Richard Smart
+61488030284

cc120479 1 comment Joined 05/11
10 Feb 2014

We are no longer supporting teradataR since the decision was made for Teradata to focus on our partnership with Revolution for R integration with Teradata.  However, Alexander Bessonov has updated the teradataR package to work with R 3.0 and placed it on Github at https://github.com/Teradata/teradataR.

Alexander K 1 comment Joined 09/13
28 Feb 2014

For connecting to Teradata through R on a Linux machine, check also the follwoing article if it might be of any help:
http://forums.teradata.com/forum/analytics/connecting-to-teradata-in-r-via-the-teradatar-package-teradata-express-vm-version

You must sign in to leave a comment.