- Browse by Author
Browsing by Author "Terza, Joseph V."
Now showing 1 - 10 of 12
Results Per Page
Sort Options
Item Casual analysis using two-part models : a general framework for specification, estimation and inference(2018-06-22) Hao, Zhuang; Terza, Joseph V.; Devaraj, Srikant; Liu, Ziyue; Mak, Henry; Ottoni-Wilhelm, MarkThe two-part model (2PM) is the most widely applied modeling and estimation framework in empirical health economics. By design, the two-part model allows the process governing observation at zero to systematically differ from that which determines non-zero observations. The former is commonly referred to as the extensive margin (EM) and the latter is called the intensive margin (IM). The analytic focus of my dissertation is on the development of a general framework for specifying, estimating and drawing inference regarding causally interpretable (CI) effect parameters in the 2PM context. Our proposed fully parametric 2PM (FP2PM) framework comprises very flexible versions of the EM and IM for both continuous and count-valued outcome models and encompasses all implementations of the 2PM found in the literature. Because our modeling approach is potential outcomes (PO) based, it provides a context for clear definition of targeted counterfactual CI parameters of interest. This PO basis also provides a context for identifying the conditions under which such parameters can be consistently estimated using the observable data (via the appropriately specified data generating process). These conditions also ensure that the estimation results are CI. There is substantial literature on statistical testing for model selection in the 2PM context, yet there has been virtually no attention paid to testing the “one-part” null hypothesis. Within our general modeling and estimation framework, we devise a relatively simple test of that null for both continuous and count-valued outcomes. We illustrate our proposed model, method and testing protocol in the context of estimating price effects on the demand for alcohol.Item The effect of sugar-sweetened beverage consumption on childhood obesity - causal evidence(2016-05-18) Yang, Yan; Terza, Joseph V.; Courtemanche, Charles; Jung, Haeil; Mak, Henry Y.; Wu, JisongCommunities and States are increasingly targeting the consumption of sugar sweetened beverages (SSBs), especially soda, in their efforts to curb childhood obesity. However, the empirical evidence based on which policy makers design the relevant policies is not causally interpretable. In the present study, we suggest a modeling framework that can be used for making causal estimation and inference in the context of childhood obesity. This modeling framework is built upon the two-stage residual inclusion (2SRI) instrumental variables method and have two levels – level one models children’s lifestyle choices and level two models children’s energy balance which is assumed to be dependent on their lifestyle behaviors. We start with a simplified version of the model that includes only one policy, one lifestyle, one energy balance, and one observable control variable. We then extend this simple version to be a general one that accommodates multiple policy and lifestyle variables. The two versions of the model are 1) first estimated via the nonlinear least square (NLS) method (henceforth NLS-based 2SRI); and 2) then estimated via the maximum likelihood estimation (MLE) method (henceforth MLE-based 2SRI). Using simulated data, we show that 1) our proposed 2SRI method outperforms the conventional method that ignores the inherent nonlinearity [the linear instrumental variables (LIV) method] or the potential endogeneity [the nonlinear regression (NR) method] in obtaining the relevant estimators; and 2) the MLE-based 2SRI provides more efficient estimators (also consistent) compared to the NLS-based one. Real data analysis is conducted to illustrate the implementation of 2SRI method in practice using both NLS and MLE methods. However, due to data limitation, we are not able to draw any inference regarding the impacts of lifestyle, specifically SSB consumption, on childhood obesity. We are in the process of getting better data and, after doing so, we will replicate and extend the analyses conducted here. These analyses, we believe, will produce causally interpretable evidence of the effects of SSB consumption and other lifestyle choices on childhood obesity. The empirical analyses presented in this dissertation should, therefore, be viewed as an illustration of our newly proposed framework for causal estimation and inference.Item Exploring the Importance of Accounting for Nonlinearity in Correlated Count Regression Systems from the Perspective of Causal Estimation and Inference(2021-07) Zhang, Yilei; Terza, Joseph V.; Vest, Joshua R.; Morrison, Wendy; Gupta, SumedhaThe main motivation for nearly all empirical economic research is to provide scientific evidence that can be used to assess causal relationships of interest. Essential to such assessments is the rigorous specification and accurate estimation of parameters that characterize the causal relationship between a presumed causal variable of interest, whose value is to be set and altered in the context of a relevant counterfactual and a designated outcome of interest. Relationships of this type are typically characterized by an effect parameter (EP) and estimation of the EP is the objective of the empirical analysis. The present research focuses on cases in which the regression outcome of interest is a vector that has count-valued elements (i.e., the model under consideration comprises a multi-equation system of equations). This research examines the importance of account for nonlinearity and cross-equation correlations in correlated count regression systems from the perspective of causal estimation and inference. We evaluate the efficiency and accuracy gains of estimating bivariate count valued systems-of-equations models by comparing three pairs of models: (1) Zellner’s Seemingly Unrelated Regression (SUR) versus Count-Outcome SUR - Conway Maxwell Poisson (CMP); (2) CMP SUR versus Single-Equation CMP Approach; (3) CMP SUR versus Poisson SUR. We show via simulation studies that it is more efficient to estimate jointly than equation-by-equation, it is more efficient to account for nonlinearity. We also apply our model and estimation method to real-world health care utilization data, where the dependent variables are correlated counts: count of physician office-visits, and count of non-physician health professional office-visits. The presumed causal variable is private health insurance status. Our model results in a reduction of at least 30% in standard errors for key policy EP (e.g., Average Incremental Effect). Our results are enabled by our development of a Stata program for approximating two-dimensional integrals via Gauss-Legendre Quadrature.Item Health Policy Analysis from a Potential Outcomes Perspective: Smoking During Pregnancy and Birth Weight(2014-08-25) Terza, Joseph V.Most empirical research in health economics is conducted with the goal of providing scientific evidence that will serve to inform current and future health policy. The use of parametric nonlinear regression (NR) methods for empirical analysis in health economics abounds. Studies that offer clear policy-relevant interpretations of NR results are, however, rare. We offer a comprehensive policy analytic framework within which the applied researcher can: 1) clearly define the policy-relevant estimation objective; 2) consistently estimate that objective using NR methods designed to account for the possible endogeneity of the policy variable of interest; 3) conduct correct asymptotic inference; and 4) offer policy-relevant interpretations of the empirical results. For binary policies, Rubin (1974, 1977) developed the potential outcomes framework (POF). We propose a generally applicable extension of the POF (EPOF) which covers a broad range of policy analytic contexts. In particular, our EPOF accommodates: a) a non-binary policy variable of interest (Xp ); b) policy-relevant counterfactual versions of Xp that are not fixed values; and c) a policy-defining increment to Xp that is not constant. Moreover, our EPOF facilitates the use of extant nonlinear regression (NR) methods that correct for potential bias due to the endogeneity of Xp . As a case in point, we consider the analysis of potential gains in infant birth weight that may result from a prenatal smoking prevention and cessation policy which, if fully effective, would maintain zero levels of smoking for non-smokers (prevention) and convince smokers to quit before becoming pregnant (cessation). In the context of our EPOF, using endogeneity-correcting NR methods, we re-analyze the data examined by Mullahy (1997) and estimate the potential effect of the smoking prevention/cessation policy described above. The EPOF should serve as a useful guide to applied health policy analysts.Item Inference Using Sample Means of Parametric Nonlinear Data Transformations(Wiley, 2016-06) Terza, Joseph V.; Economics, School of Liberal ArtsItem Quantifying risk of injury from usual alcohol consumption: An instrumental variable analysis(Wiley, 2021) Ye, Yu; Cherpitel, Cheryl J.; Terza, Joseph V.; Kerr, William C.; Economics, School of Liberal ArtsBackground: There have been numerous studies of roadside accidents among emergency room patients showing elevated risk of injury from acute alcohol consumption, i.e. recent drinking prior to the injury event, with large effect size and a dose-response relationship observed. In contrast, studies quantifying the association between injury risk and chronic consumption such as past year average volume show that relative risk estimates are low compared to those from acute consumption. Methods: Using the US National Alcohol Surveys (NAS) combining four waves for years 2000–2015 (N=29,571, 53% overall cooperation rate), risk of any past-year injury was first estimated by past-year volume using logistic regression. An instrumental variable (IV) analysis utilizing the two-stage residual inclusion (2SRI) approach was then conducted to estimate injury risk from volume, further adjusting for unobserved confounders, using state beer and spirits tax rates, zip code-level outlet and bar density, and control state status as instruments. Results: Based on the combined US population surveys and controlling for socio-demographics, odds ratios of injury from average volume of 1, 2 and 5 drinks per day were 1.12 [95% confidence interval: 1.02, 1.24], 1.10 [1.00, 1.22], and 1.04 [0.88, 1.22], respectively, using conventional logistic regression, compared to 1.67 [1.00, 2.78], 2.38 [0.87, 6.54] and 6.98 [0.57, 85.89] using the IV method. The proportion of injury attributed to alcohol also increased in magnitude, from 6.2% [0.3%, 11.9%] using the conventional approach to 17.9% [8.2%, 27.7%] using the IV method. Conclusions: Findings suggest that the association between injury and chronic alcohol consumption may be confounded by unobserved factors, with the risk estimate possibly biased downward.Item Regression-Based Causal Analysis from the Potential Outcomes Perspective(Taylor & Francis, 2020-01) Terza, Joseph V.; Economics, School of Liberal ArtsMost empirical economic research is conducted with the goal of providing scientific evidence that will be informative in assessing causal relationships of interest based on relevant counterfactuals. The implementation of regression methods in this context is ubiquitous. With this as motivation, we detail a comprehensive regression-based potential outcomes framework for causal modeling, estimation and inference. This framework facilitates rigorous specification of the effect parameter of interest and makes clear the sense in which it is causally interpretable, when appropriately defined in a potential outcomes setting. It also serves to crystallize the conditions under which the effect parameter and the underlying regression parameters are identified. The consistent sample analog estimator of the effect parameter is discussed. Juxtaposing this framework with a stylized version of a commonly implemented and routinely applied modeling and estimation protocol reveals how the latter is deficient in recognizing, and fully accounting for, conditions required for identification of the relevant effect parameter and the causal interpretability of estimation results. In the context of an example, we demonstrate the conceptual advantages of this general potential outcomes framework for regression modeling by showing how it resolves fundamental shortcomings in the conventional approach to characterizing and remedying omitted variable bias.Item Simpler Standard Errors for Multi-Stage Regression-Based Estimators: Illustrations in Health Economics(2014-08-25) Terza, Joseph V.With a view towards lessening the analytic and computational burden faced by researchers in empirical health economics who seek an alternative to bootstrapping for the standard errors of two-stage estimators, we offer heretofore unexploited simplifications of the typical, but somewhat daunting, textbook approach. For the most commonly encountered cases in empirical health economics – two-stage estimators that, in either stage, involve maximum likelihood estimation or the nonlinear least squares method – we show that: 1) the usual textbook formulation of the relevant asymptotic covariance can be substantially reduced in complexity; and 2) nearly all components of our simplified formulation can be retrieved as outputs from packaged regression routines (e.g., in Stata). With the applied researcher in mind, we illustrate these points with two examples in empirical health economics that involve the estimation of causal effects in the presence of endogeneity – a sampling problem that can often be solved via two-stage estimation. As a by-product of this illustrative discussion, we detail four very useful two-stage estimators (and their asymptotic standard errors) that are consistent for the model parameters in such settings, along with their corresponding multi-stage causal effect estimators (and their asymptotic standard errors).Item Specification, estimation and testing of treatment effects in multinomial outcome models : accommodating endogeneity and inter-category covariance(2018-06-18) Tang, Shichao; Terza, Joseph V.; Carlin, Paul; Lin, Hsien-Chang; Morrison, Gwendolyn; Seo, BoyoungIn this dissertation, a potential outcomes (PO) based framework is developed for causally interpretable treatment effect parameters in the multinomial dependent variable regression framework. The specification of the relevant data generating process (DGP) is also derived. This new framework simultaneously accounts for the potential endogeneity of the treatment and loosens inter-category covariance restrictions on the multinomial outcome model (e.g., the independence from irrelevant alternatives restriction). Corresponding consistent estimators for the “deep parameters” of the DGP and the treatment effect parameters are developed and implemented (in Stata). A novel approach is proposed for assessing the inter-category covariance flexibility afforded by a particular multinomial modeling specification [e.g. multinomial logit (MNL), multinomial probit (MNP), and nested multinomial logit (NMNL)] in the context of our general framework. This assessment technique can serve as a useful tool for model selection. The new modeling/estimation approach developed in this dissertation is quite general. I focus here, however, on the NMNL model because, among the three modeling specifications under consideration (MNL, MNP and NMNL), it is the only one that is both computationally feasible and is relatively unrestrictive with regard to inter-category covariance. Moreover, as a logical starting point, I restrict my analyses to the simplest version of the model – the trinomial (three-category) NMNL with an endogenous treatment (ET) variable conditioned on individual-specific covariates only. To identify potential computational issues and to assess the statistical accuracy of my proposed NMNL-ET estimator and its implementation (in Stata), I conducted a thorough simulation analysis. I found that conventional optimization techniques are, in this context, generally fraught with convergence problems. To overcome this, I implement a systematic line search algorithm that successfully resolves this issue. The simulation results suggest that it is important to accommodate both endogeneity and inter-category covariance simultaneously in model design and estimation. As an illustration and as a basis for comparing alternative parametric specifications with respect to ease of implementation, computational efficiency and statistical performance, the proposed model and estimation method are used to analyze the impact of substance abuse/dependence on the employment status using the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) data.Item A Switching Regressions Framework for Models with Count-Valued Omni-Dispersed Outcomes: Specification, Estimation and Causal Inference(2020-02) Manalew, Wondimu Samuel; Terza, Joseph V.; Boukai, Ben; Osili, Una; Tennekoon, Vidhura; Trombley, MattIn this dissertation, I develop a regression-based approach to the specification and estimation of the effect of a presumed causal variable on a count-valued outcome of interest. Statistics for relevant causal inference are also derived. As an illustration and as a basis for comparing alternative parametric specifications with respect to ease of implementation, computational efficiency and statistical performance, the proposed models and estimation methods are used to analyze household fertility decisions. I estimate the effect of a counterfactually imposed additional year of wife’s education on actual family size (AFS) and desired family size (DFS) [count-valued variables]. In order to ensure the causal interpretability of the effect parameter as I define it, the underlying regression model is cast in a potential outcomes (PO) framework. The specification of the relevant data generating process (DGP) is also derived. The regression-based approach developed in the dissertation, in addition to taking explicit account of the fact that the outcome of interest is count-valued, is designed to account for potential sample selection bias due to a particular data deficiency in the count data context and to accommodate the possibility that some structural aspects of the model may vary with the value of a binary switching variable. Moreover, my approach loosens the equi-dispersion constraint [conditional mean (CM) equals conditional variance (CV)] that plagues conventional (poisson) count-outcome regression models. This is a particularly important feature of my model and method because in most contexts in empirical economics the data are either over-dispersed (CM < CV) or under-dispersed (CM > CV) – fertility models are usually characterized by the latter. Alternative count data models were discussed and compared using simulated and real data. The simulation results and estimation results using real data suggest that the estimated effects from my proposed models (models that loosen the equi-dispersion constraint, account for the sample selection, and accommodate variability in structural aspect of the models due to a switching variable) substantively differ from estimates from a conventional linear and count regression specifications.