MARS Versus Logit When Using CART for Missing Data Imputation in Logit: Examples in Market Research Science
May 30th, 2011
Since item non-response is increasing in many marketing datasets, non-parametric data mining techniques are a perfect tool for analyzing this data. Multivariate Adaptive Regression Splines (MARS) can be used to analyze binary discrete data because missing data values can still be used, whereas missing data will be listwise deleted from parametric analyses such as linear probability models (LPM) and logistic regression. Using MARS on binary discrete data yields a model that has valid parameter estimates but errors that are invalid because they are Bernoulli distributed.
To estimate valid errors, the basis functions generated by MARS can be put into parametric models such as a LPM (estimated by weighted least squares) or a logistic regression model and estimated. MARS models were run on three different database types with varying amount of non-response data: a medical database, a stated preference survey for deploying a new brand of shoes, and a marketing database of demographics and income. The analysis showed that using the two-step MARS/parametric approach outperformed parametric only techniques such as mean value imputation, OLS regression imputation, or CART-based imputation for missing values. Additionally, as the amount of listwise deletion of data increased in a dataset, MARS’ performance does not degrade compared to imputed value methods in parametric only techniques.
Posted by: XpressingWEB™
Entry Filed under: Computer technology stories