S estimation is a high breakdown value method introduced by rousseeuw and yohai 1984. Outlier detection using nonconvex penalized regression. Fast and robust diagnostic technique for the detection of. This estimator is fast to compute using the algorithm of ruppert 1992, has a breakdown point which can be up to 50%, and is asymptotically normal. The asymptotics of sestimators in the linear regression model. For a discussion of the problem of robust estimation in the linear regression model we refer to hampel, ronchetti, rousseeuw and stahel 1986, huber 1981, rousseeuw 1984, rousseeuw and yohai 1984 and rousseeuw and leroy 1987. The asymptotic distribution of mmestimates has been studied by yohai 1987 under the assumption that h h0 central parametric model. Proof of the breakdown point of sestimators can be found in m. Introduction to rousseeuw 1984 least median of squares.
The mm estimation, a special type of m estimation introduced by yohai 1987, combines high breakdown value estimation and efficient estimation. We programmed these two estimators in stata and made them available through the ltsregress and lmsregress commands. Detecting these unusual observations is an important aspect of model building in that they have to be diagnosed so as to ascertain whether they are influential or not. The term sestimators was first used by rousseeuw and yohai 15 to. Rousseeuw, 1984 the asymptotic breakdown point is then defined as 2. Different influential statistics including cooks distance, welschkuh distance and dfbetas have been proposed.
Later, they were applied to the multivariate scale and location estimation problem davies, 1992. Proteomic biomarkers study using novel robust penalized. Generalizations of mestimators by mallows, schweppe, and others also fail to achieve high breakdown values. Reference documentation delivered in html and pdf free on the web. Robust regression by means of sestimators springerlink. Unmasking multivariate outliers and leverage points. These are all called highbreakdown estimators since they can be tuned to resist contamination in up to 50% of the observations. The dets and detmm estimators for multivariate location and. To overcome these limitations, maronna 2011 has recently proposed an. However, it uses a robust measure for the variance. Journal of the american statistical association, 85, 633639. Th e objectives of this study were to develop the overall and seasonal loaddischarge relationships of losses of ss. Reflection pdf mulan ulead video studio 14 crack torrent 2017.
The br akdown point approach is highly attractive for a number of reasons, not the least. Written in 1948, 1984 was george orwells chilling prophecy about the future. A smooth highbreakdown estimator for the regression model is the sestimator rousseeuw and yohai, 1984. Robust regression diagnostics of influential observations in. In regression analysis, data sets often contain unusual observations called outliers. A fast procedure for outlier diagnostics in large regression. Similarly as in 8, sestimators of re bust methods for regression models that include cat gression rousseeuw and yohai, 1984 are defined egorical or binary regressors have been developed as the solution. Rousseeuw and yohai 1984 minimize the variance of the residuals. It has a higher statistical e ciency than sestimation. More robust estimation techniques, such as m estimation huber, 1973, least trimmed squares lts estimation rousseeuw, 1984, sestimation rousseeuw and yohai, 1984, and mm estimation yohai, 1987 have been. Least trimmed squares lts regression is based on the subset of h observations out of a total of n observations whose least squares fit possesses the smallest sum of squared residuals. A robust learning approach for regression models based on. Userfriendly covariance estimation for heavytailed.
Rousseeuw and yohai 1984 and rousseeuw and leroy 1987. A resampling design for computing highbreakdown regression. Part of the lecture notes in statistics book series lns, volume 26. Huber 1973, 1981, andrews 1974, rousseeuw and yohai 1984, hampel, ronchetti, rousseeuw, and stahel 1986, yohai 1987, and rousseeuw and leroy 1987, for more detail. The aim is then to estimate the multivariate location and the scatter matrix of. Sep 22, 1993 this approach has been generalized to sestimators rousseeuw and yohai, 1984, mmestimators yohai, 1987, and restimators yohai and zamar, 1988. Pdf highbreakdown robust multivariate methods stefan.
Pdf paper 26527 robust regression and outlier detection. Pdf robust regression by means of sestimators researchgate. The olive and hawkins paradigm, as illustrated by this book, is to give theory for the estimator actually used. En robust and nonlinear time series, editores franke, hardle and martin. Most of the literature on high breakdown multivariate robust statistics follows the rousseeuw and yohai paradigm. The asymptotic distribution of mmestimates has been studied by yohai 1987 under the assumption that h ho central parametric model. Ias robust regressionu asnhington uim dept of statistics r. However, there is little research related to estimating the mixture regression parameters robustly, in part because it is not easy to replace the loglikelihood in 1.
Subsamplingbased algorithms for multivariate sestimators were proposed by ruppert 1992 and campbell et al. A combination of the high breakdown value method and mestimation is the mmestimation yohai, 1987. In this paper we consider the problem of performing inference for a linear regression model using robust estimators. In the latter two papers, the authors construct regression estimators which have both high breakdown points and high efficiency.
Both of these estimators are useful for variable selection, but can only be tuned to be either highly robust or highly efficient under the normal model yohai, 1987. Modern methods for robust regression pdf reader nexuslasopa. Following seminal papers by box 1953 and tukey 1960, which demonstrated the need for robust statistical procedures, the theory of robust statistics blossomed in the 1960s and 1970s. Even a careful analysis of the ls residuals can fail to detect the outliers. Rousseeuw and yohai 1984 proposed svestimates, defined by the property of minimizing an mestimateofthe residuals scale. The s estimation is used to minimize the dispersion of residuals. Sestimators were first introduced in the context of regression by rousseeuw and yohai 1984. The performance of this method was improved by the fastlts algorithm of rousseeuw and van driessen 1998. Robust tests for linear regression models based on estimates. The aim is then to estimate the multivariate location and the scatter matrix of apdimensional multivariate population. Jan 01, 2015 for a given breakdown value between 0% and 50% we can derive the value of the corresponding tuning parameter c in, see table 3 in rousseeuw and yohai 1984. Rousseeuw 1984 proposed the least median of squares lms and the least trimmed squares lts. Sas publishing provides a complete selection of books and electronic. That is, an minimizes the mscale an a implicitly defined by the equation 2.
Breakdown points of trimmed likelihood estimators and related. With the same breakdown value, it has a higher statistical efficiency than lts estimation. Unfortunately, sest,imators eannot achieve high breakdown point and high efficiency simultaneorrsly. Sestimators, proposed by rousseeuw and yohai 1984, were the. Pdf minmax bias robust regression ruben zamar academia. It should be noted that the problem of bias robustness and the desirability of optimal bias robust estimators, namely minmax bias estimates, is clearly recognized in hampel et. Rousseeuw born october 1956 is a statistician known for his work on robust statistics and cluster analysis. They considered observation as hlps if its corresponding rmd value exceeds the cutoff points. Stefanski department of statistics, north carolina state university. A disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuwyohai, 1984.
With the same breakdown value, it has a higher statistical ef. The use of alternative regression methods in social. Sestimators of regression parameters, proposed by rousseeuw and yohai 1984, search for the slope and intercept values that minimize some measure of. These estimates have a very high computational complexity, and thus the usual algorithms compute only approximate solutions. Yohai 1984, and sestimators for multivariate location and scatter have been. Reflection pdf mulan ulead video studio 14 crack torrent 2017 frigidaire affinity front load washer repair guide download dll installer caio terra modern jiu jitsu dvd download torrent. These estimates have a very high computational complexity and therefore the usual algorithms compute only approximate solutions. A generalization is given by sestimators rousseeuw and. Yohai 1984, by permission of springerverlag, new york. Almost all start with an initial high breakdown point estimate not necessarily e. He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book on influence functions. An empirical comparison between robust estimation and robust. The objective functions of these hb estimators are not convex and can have several local minima.
Sestimators encyclopedia of statistical sciences tyler wiley. Robust regression via lts methods which achieve the goal of being insensitive to changes in a small percentage of the observations have only recently been developed. Part of the springer series in statistics book series sss. Rousseeuw featurespositivebreakdown spl 1994 department of. Mm estimation, introduced by yohai 1987, combines high breakdown value estimation and m estimation. The breakdown value is a measure of the proportion of contamination that a procedure can withstand and still maintain its robustness. In this talk we present an algorithm for sestimates see rousseeuw and yohai, 1984 similar to the fastlts. This algorithm, that we call \fasts, is based on modifying each candidate with a step that improves the soptimality criterion, and thus allows to reduce the number of subsamples. Runoff losses of suspended sediment, nitrogen, and phosphorus. Oct 27, 2020 a disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuw yohai, 1984. Tools for monitoring robust regression in sas iml studio jrc. Robust regression and outlier detection wiley online library. The asymptotic breakdown point of the sestimator is given by rousseeuw and yohai, 1984. Rousseeuw and yohai 1984, by permission of springerverlag, new york.
Rousseeuw 1984 proposed an approximate algorithm based on drawing random subsamples of the same size than the number of carriers. Userfriendly covariance estimation for heavytailed distributions yuan ke, stanislav minskery, zhao ren z, qiang sun xand wenxin zhou abstract we provide a. Rousseeuw, 1984, sestimation rousseeuw and yohai, 1984, and mm estimation yohai, 1987 have been developed to reduce the in. However, all these estimates have very low efficiency under a regression model with normal errors. Pdf detecting influential observations in principal. Robust and nonlinear time series analysis, 256272, 1984. If most of the large sample theory in the text is covered, then the course should be limited to ph.
The dets and detmm estimators for multivariate location. Univ seattle dept martin i flfllfllfflf lflflflflflflflfllfll. As rousseeuw 1984 shows, regression mestimators also have 0% breakdown value. Also in this book, they conclude from 4 without additional. Runoff losses of suspended sediment, nitrogen, and. Sestimators have been generalized to multivariate estimation of position and dispersion by davies 1987 and lopuha. The following dataset can be found in the world almanac and book of facts.
Donoho 1982, donoho and huber 1983, rousseeuw 1984, rousseeuw and yohai 1984, yohai 1986, hampel et al. S estimation is a high breakdown value method introduced by rousseeuw and yohai. Rousseeuw 1984 developed the first practical robust regression estimators least median squares lms, least trimmed squares lts, and variants which behave reasonably even in the presence of a large number of outliers. Rousseeuw and leroy, 1987, sestimators rousseeuw and yohai 1984, mm. A disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuw yohai, 1984. Userfriendly covariance estimation for heavytailed distributions yuan ke, stanislav minskery, zhao ren z, qiang sun xand wenxin zhou abstract we provide a survey of recent results on covariance estimation for heavy. The ltsestimator and the sestimator are asymptotically normal with rate of convergence n12 and their asymptotic. Robust regression diagnostics of influential observations. The s estimate proposed by rousseeuw and yohai 1984 is defined as the pvector. The use of alternative regression methods in social sciences. Mmestimates yohai 1987 are obtained by a iteration procedure.
1028 259 185 444 102 1318 508 285 754 1100 882 705 777 781 975 956 1180 1340 278 69 885 1368 1049 250 533 156 331 221 1346 1015 731 1419 900 112 182 724 370 636 517