“Big Data” is a popular topic that has been gaining attention from the high-performance computing niche of the information technology market. Big Data is an enormous amount of data from which it is extremely difficult to manage and glean information. Big Data provides both challenges and opportunities for quantitative analysts to develop improved predictive and descriptive models [1].
SAS launched SAS High-Performance Data Mining in December 2011 to enable you to analyze more data faster than ever before possible. Based on the SAS High-Performance Analytics model for distributed in-memory processing, SAS High-Performance Data Mining is delivered SAS software that uses Teradata or EMC Greenplum hardware. A subset of SAS High-Performance Analytics, SAS High-Performance Data Mining has truly revolutionized the model building and model-scoring processes. The massively parallel in-memory algorithms enable organizations to derive highly accurate and timely data mining models in minutes, not hours or days, to make better-informed business decisions [1].
SAS has developed many HP procedures, such as,
- HPDMDB Summarize data
- HPDS2 Parallel execution of DS2
- HPFOREST Random forest
- HPLOGISTIC Logistic regression
- HPNEURAL Neural network modeling
- HPNLIN Nonlinear regression
- HPREDUCE Unsupervised variable selection
- HPREG Regression
- HPBIN Variable Binning
- HPSAMPLE Sampling and data partitioning
- HPIMPUTE Imputation
- HPSEVERITY Severity models
- HPCOUNTREG Regression of count variables
- HPSUMMARY Summarize data
- HPLMIXED Mixed linear models
- HPATEST Test operational status of system
Hopefully, these procedures will be available in the late 2013, and will be available to academic field too.
Reference:
1. A New Age of Data Mining in the High-Performance World
http://support.sas.com/resources/papers/proceedings12/137-2012.pdf
2. SAS workshop in UCLA.