Outlier detection by regression diagnostics in large data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE Computer Society
Abstract
Regression analysis is a well known supervised learning technique. To estimate and justify an effective model from regression analysis it is necessary to check and preprocess the data set. Without outliers (noise) it is impossible to get a real data. Areas in bio-informatics, astronomy, image analysis, computer vision etc, large or fat data appear with unusual observations (outliers) very naturally. In these industries robust regression are commonly used in model building process. But robust regression methods are not good enough in large and/or high dimensional data. Checking raw data for outliers in regression is regression diagnostics. Robust regression and regression diagnostics are two complementary ideas and any one is not enough for studying a contaminated data. Most of the popular diagnostic methods are not sufficient for large data because of masking and swamping. In this article, both of the above ideas are shortly discussed and we show a new measure can effectively identify outliers (influential observations) in linear regression for large data.
Description
Citation
Nurunnabi, A. A. M., & Nasser, M. (2009, April). Outlier detection by regression diagnostics in large data. In 2009 International Conference on Future Computer and Communication (pp. 246-250). IEEE.