Robustness against separation and outliers in binary regression

Peter J. Rousseeuw and Andreas Christmann

Date: 05/OCT/2001

Abstract

The logistic regression model is commonly used to describe the effect of one or several explanatory variables on a binary response variable. Here we consider an alternative model under which the observed response is strongly related but not equal to the unobservable true response. We call this the hidden logistic regression (HLR) model because the unobservable true responses are comparable to a hidden layer in a feedforward neural net. We propose the maximum estimated likelihood method in this model, which is robust against separation unlike existing methods for logistic regression. We also consider outlier-robust estimation in this setting.

Keywords: Logistic regression; Hidden layer; Overlap; Robustness.


References

A. Albert, J.A. Anderson (1984).
On the existence of maximum likelihood estimates in logistic regression models. Biometrika, 71, 1-10.
A. Christmann (1998).
On positive breakdown point estimators in regression models with discrete response variables. Habilitation thesis, University of Dortmund, Department of Statistics.
A. Christmann, P. Fischer, T. Joachims (2002).
Comparison between the regression depth method and the support vector machine to approximate the minimum number of misclassifications. To appear in: Computational Statistics, 2.
A. Christmann, P. J. Rousseeuw (2001).
Measuring overlap in logistic regression. Computational Statistics and Data Analysis, 37, 65-75.
J.B. Copas (1988).
Binary Regression Models for Contaminated Data. With discussion. J.R.Statist.Soc. , B, 50, 225-265.
H. R. Künsch, L. A. Stefanski, and R. J. Carroll (1989).
Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Amer. Statist. Assoc., 84, 460-466.
P. J. Rousseeuw, M. Hubert (1999).
Regression Depth. J. Amer. Statist. Assoc., 94, 388-433.
P.J. Rousseeuw, P.J., K. Van Driessen (1999).
Computing LTS Regression for Large Data Sets. Technical Report, University of Antwerp, submitted.
T. J. Santner, D. E. Duffy (1986).
A note on A. Albert and J.A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models. Biometrika, 73, 755-758.