Variable Selection by Regularization Methods for Generalized Mixed Models
A regression analysis describes the dependency of random variables in the form of a functional relationship. One distinguishes between the dependent response variable and one or more independent influence variables. There is a variety of model classes and inference methods available, ranging from the conventional linear regression model up to recent non- and semiparametric regression models. The so-called generalized regression models form a methodically consistent framework incorporating many regression approaches with response variables that are not necessarily normally distributed, including the conventional linear regression model based on the normal distribution assumption as a special case. When repeated measurements are modeled in addition to fixed effects also random effects or coefficients can be included. Such models are known as Random Effects Models or Mixed Models. As a consequence, regression procedures are applicable extremely versatile and consider very different problems.
In this dissertation regularization techniques for generalized mixed models are developed that are able to perform variable selection. These techniques are especially appropriate when many potential influence variables are present and existing approaches tend to fail. First of all a componentwise boosting technique for generalized linear mixed models is presented which is based on the likelihood function and works by iteratively fitting the residuals using weak learners. The complexity of the resulting estimator is determined by information criteria. For the estimation of variance components two approaches are considered, an estimator resulting from maximizing the profile likelihood, and an estimator which can be calculated using an approximative EM-algorithm. Then the boosting concept is extended to mixed models with ordinal response variables. Two different types of ordered models are considered, the threshold model, also known as cumulative model, and the sequential model. Both are based on the assumption that the observed response variable results from a categorized version of a latent metric variable. In the further course of the thesis the boosting approach is extended to additive predictors. The unknown functions to be estimated are expanded in B-spline basis functions, whose smoothness is controlled by penalty terms. Finally, a suitable L1-regularization technique for generalized linear models is presented, which is based on a combination of Fisher scoring and gradient optimization. Extensive simulation studies and numerous applications illustrate the competitiveness of the methods constructed in this thesis compared to conventional approaches. For the calculation of standard errors bootstrap methods are used.