# Sample Splitting and Threshold Estimation

## Abstract

Threshold models have a wide variety of applications in economics. Direct applications include models of separating and multiple equilibria. Other applications include empirical sample splitting when the sample split is based on a continuously-distributed variable such as firm size. In addition, threshold models may be used as a parsimonious strategy for non-parametric function estimation. For example, the threshold autoregressive model (TAR) is popular in the non-linear time series literature. Threshold models also emerge as special cases of more complex statistical frameworks, such as mixture models, switching models, Markov switching models, and smooth transition threshold models. It may be important to understand the statistical properties of threshold models as a preliminary step in the development of statistical tools to handle these more complicated structures. Despite the large number of potential applications, the statistical theory of threshold estimation is undeveloped. The previous literature has demonstrated that threshold estimates are super-consistent, but a distribution theory useful for testing and inference has yet to be provided. This paper develops a statistical theory for threshold estimation in the regression context. We allow for either cross-section or time series observations. Least squares estimation of the regression parameters is considered. An asymptotic distribution theory for the regression estimates (the threshold and the regression slopes) is developed. It is found that the distribution of the threshold estimate is non- standard. Methods to construct asymptotic confidence intervals are introduced, using both a t-statistic and LR-statistic approach. Monte Carlo simulations are presented to assess the accuracy of the asymptotic approximations. The empirical relevance of the theory is illustrated through an application to the multiple equilibria growth model of Durlauf and Johnson (1995).