# Essays in Econometrics and Machine Learning

## Abstract

This dissertation consists of three chapters demonstrating how the current econometric problems can be solved by using machine learning techniques. In the first chapter, I propose new approaches to estimating large dimensional monotone index models. This class of models has been popular in the applied and theoretical econometrics literatures as it includes discrete choice, nonparametric transformation, and duration models. A main advantage of my approach is computational. For instance, rank estimation procedures such as those proposed in Han (1987) and Cavanagh and Sherman (1998) that optimize a nonsmooth, non convex objective function are difficult to use with more than a few regressors and so limits their use in with economic data sets. For such monotone index models with increasing dimension, we propose to use a new class of estimators based on batched gradient descent (BGD) involving nonparametric methods such as kernel estimation or sieve estimation, and study their asymptotic properties. The BGD algorithm uses an iterative procedure where the key step exploits a strictly convex objective function, resulting in computational advantages. A contribution of my approach is that the model is large dimensional and semiparametric and so does not require the use of parametric distributional assumptions. The second chapter studies the estimation of semiparametric monotone index models when the sample size n is extremely large and conventional approaches fail to work due to devastating computational burdens. Motivated by the mini-batch gradient descent algorithm (MBGD) that is widely used as a stochastic optimization tool in the machine learning field, this chapter proposes a novel subsample- and iteration-based estimation procedure. In particular, starting from any initial guess of the true parameter, the estimator is progressively updated using a sequence of subsamples randomly drawn from the data set whose sample size is much smaller than n. The update is based on the gradient of some well-chosen loss function, where the nonparametric component in the model is replaced with its Nadaraya-Watson kernel estimator that is also constructed based on the random subsamples. The proposed algorithm essentially generalizes MBGD algorithm to the semiparametric setup. Since the new method uses only a subsample to perform Nadaraya-Watson kernel estimation and conduct the update, compared with the full-sample-based iterative method, the new method reduces the computational time by roughly n times if the subsample size and the kernel function are chosen properly, so can be easily applied when the sample size n is large. Moreover, this chapter shows that if averages are further conducted across the estimators produced during iterations, the difference between the average estimator and full-sample-based estimator will be 1/\sqrt{n}-trivial. Consequently, the averaged estimator is 1/\sqrt{n}-consistent and asymptotically normally distributed. In other words, the new estimator substantially improves the computational speed, while at the same time maintains the estimation accuracy. Finally, extensive Monte Carlo experiments and real data analysis illustrate the excellent performance of novel algorithm in terms of computational efficiency when the sample size is extremely large. Finally, the third chapter studies robust inference procedure for treatment effects in panel data with flexible relationship across units via the random forest method. The key contribution of this chapter is twofold. First, it proposes a direct construction of prediction intervals for the treatment effect by exploiting the information of the joint distribution of the cross-sectional units to construct counterfactuals using random forest. In particular, it proposes a Quantile Control Method (QCM) using the Quantile Random Forest (QRF) to accommodate flexible cross-sectional structure as well as high dimensionality. Second, it establishes the asymptotic consistency of QRF under the panel/time series setup with high dimensionality, which is of theoretical interest on its own right. In addition, Monte Carlo simulations are conducted and show that prediction intervals via the QCM have excellent coverage probability for the treatment effects comparing to existing methods in the literature, and are robust to heteroskedasticity, autocorrelation, and various types of model misspecifications. Finally, an empirical application to study the effect of the economic integration between Hong Kong and mainland China on Hong Kong’s economy is conducted to highlight the potential of the proposed method.