- Browse by Author
Browsing by Author "Boukai, Ben"
Now showing 1 - 10 of 11
Results Per Page
Sort Options
Item ACLRO: An Ontology for the Best Practice in ACLR Rehabilitation(2020-10) Phalakornkule, Kanitha; Jones, Josette F.; Boukai, Ben; Liu, Xiaowen; Purkayatha, Saptarshi; Duncan, William D.With the rise of big data and the demands for leveraging artificial intelligence (AI), healthcare requires more knowledge sharing that offers machine-readable semantic formalization. Even though some applications allow shared data interoperability, they still lack formal machine-readable semantics in ICD9/10 and LOINC. With ontology, the further ability to represent the shared conceptualizations is possible, similar to SNOMED-CT. Nevertheless, SNOMED-CT mainly focuses on electronic health record (EHR) documenting and evidence-based practice. Moreover, due to its independence on data quality, the ontology enhances advanced AI technologies, such as machine learning (ML), by providing a reusable knowledge framework. Developing a machine-readable and sharable semantic knowledge model incorporating external evidence and individual practice’s values will create a new revolution for best practice medicine. The purpose of this research is to implement a sharable ontology for the best practice in healthcare, with anterior cruciate ligament reconstruction (ACLR) as a case study. The ontology represents knowledge derived from both evidence-based practice (EBP) and practice-based evidence (PBE). First, the study presents how the domain-specific knowledge model is built using a combination of Toronto Virtual Enterprise (TOVE) and a bottom-up approach. Then, I propose a top-down approach using Open Biological and Biomedical Ontology (OBO) Foundry ontologies that adheres to the Basic Formal Ontology (BFO)’s framework. In this step, the EBP, PBE, and statistic ontologies are developed independently. Next, the study integrates these individual ontologies into the final ACLR Ontology (ACLRO) as a more meaningful model that endorses the reusability and the ease of the model-expansion process since the classes can grow independently from one another. Finally, the study employs a use case and DL queries for model validation. The study's innovation is to present the ontology implementation for best-practice medicine and demonstrate how it can be applied to a real-world setup with semantic information. The ACLRO simultaneously emphasizes knowledge representation in health-intervention, statistics, research design, and external research evidence, while constructing the classes of data-driven and patient-focus processes that allow knowledge sharing explicit of technology. Additionally, the model synthesizes multiple related ontologies, which leads to the successful application of best-practice medicine.Item Bayesian Modeling of COVID-19 Positivity Rate -- the Indiana experience(arXiv, 2020-07-09) Boukai, Ben; Wang, Jiayue; Mathematical Sciences, School of ScienceIn this short technical report we model, within the Bayesian framework, the rate of positive tests reported by the the State of Indiana, accounting also for the substantial variability (and overdispeartion) in the daily count of the tests performed. The approach we take, results with a simple procedure for prediction, a posteriori, of this rate of ’positivity’ and allows for an easy and a straightforward adaptation by any agency tracking daily results of COVID-19 tests. The numerical results provided herein were obtained via an updatable R Markdown document.Item How Much Is Your Strangle Worth? On the Relative Value of the Strangle under the Black-Scholes Pricing Model(Redfame, 2020-07) Boukai, Ben; Mathematical Sciences, School of ScienceTrading option strangles is a highly popular strategy often used by market participants to mitigate volatility risks in their portfolios. We propose a measure of the relative value of a delta-Symmetric Strangle and compute it under the standard Black-Scholes-Merton option pricing model. This new measure accounts for the price of the strangle, relative to the Present Value of the spread between the two strikes, all expressed, after a natural re-parameterization, in terms of delta and a volatility parameter. We show that under the standard BSM model, this measure of relative value is bounded by a simple function of delta only and is independent of the time to expiry, the price of the underlying security or the prevailing volatility used in the pricing model. We demonstrate how this bound can be used as a quick benchmark to assess, regardless the market volatility, the duration of the contract or the price of the underlying security, the market (relative) value of the strangle in comparison to its BSM (relative) price. In fact, the explicit and simple expression for this measure and bound allows us to also study in detail the strangle’s exit strategy and the corresponding optimal choice for a value of delta.Item Modern Monte Carlo Methods and Their Application in Semiparametric Regression(2021-05) Thomas, Samuel Joseph; Tu, Wanzhu; Boukai, Ben; Li, Xiaochen; Song, FengguangThe essence of Bayesian data analysis is to ascertain posterior distributions. Posteriors generally do not have closed-form expressions for direct computation in practical applications. Analysts, therefore, resort to Markov Chain Monte Carlo (MCMC) methods for the generation of sample observations that approximate the desired posterior distribution. Standard MCMC methods simulate sample values from the desired posterior distribution via random proposals. As a result, the mechanism used to generate the proposals inevitably determines the efficiency of the algorithm. One of the modern MCMC techniques designed to explore the high-dimensional space more efficiently is Hamiltonian Monte Carlo (HMC), based on the Hamiltonian differential equations. Inspired by classical mechanics, these equations incorporate a latent variable to generate MCMC proposals that are likely to be accepted. This dissertation discusses how such a powerful computational approach can be used for implementing statistical models. Along this line, I created a unified computational procedure for using HMC to fit various types of statistical models. The procedure that I proposed can be applied to a broad class of models, including linear models, generalized linear models, mixed-effects models, and various types of semiparametric regression models. To facilitate the fitting of a diverse set of models, I incorporated new parameterization and decomposition schemes to ensure the numerical performance of Bayesian model fitting without sacrificing the procedure’s general applicability. As a concrete application, I demonstrate how to use the proposed procedure to fit a multivariate generalized additive model (GAM), a nonstandard statistical model with a complex covariance structure and numerous parameters. Byproducts of the research include two software packages that all practical data analysts to use the proposed computational method to fit their own models. The research’s main methodological contribution is the unified computational approach that it presents for Bayesian model fitting that can be used for standard and nonstandard statistical models. Availability of such a procedure has greatly enhanced statistical modelers’ toolbox for implementing new and nonstandard statistical models.Item On the RND under Heston’s stochastic volatility model(2021) Boukai, Ben; Mathematical Sciences, School of ScienceWe consider Heston's (1993) stochastic volatility model for valuation of European options to which (semi) closed form solutions are available and are given in terms of characteristic functions. We prove that the class of scale-parameter distributions with mean being the forward spot price satisfies Heston's solution. Thus, we show that any member of this class could be used for the direct risk-neutral valuation of the option price under Heston's SV model. In fact, we also show that any RND with mean being the forward spot price that satisfies Hestons' option valuation solution, must be a member of a scale-family of distributions in that mean. As particular examples, we show that one-parameter versions of the {\it Log-Normal, Inverse-Gaussian, Gamma, Weibull} and the {\it Inverse-Weibull} distributions are all members of this class and thus provide explicit risk-neutral densities (RND) for Heston's pricing model. We demonstrate, via exact calculations and Monte-Carlo simulations, the applicability and suitability of these explicit RNDs using already published Index data with a calibrated Heston model (S\&P500, Bakshi, Cao and Chen (1997), and ODAX, Mrázek and Pospíšil (2017)), as well as current option market data (AMD).Item Optimal nonparametric inference via deep neural network(Elsevier, 2022-01) Liu, Ruiqi; Boukai, Ben; Shang, Zuofeng; Mathematical Sciences, School of ScienceDeep neural network is a state-of-art method in modern science and technology. Much statistical literature have been devoted to understanding its performance in nonparametric estimation, whereas the results are suboptimal due to a redundant logarithmic sacrifice. In this paper, we show that such log-factors are not necessary. We derive upper bounds for the L2 minimax risk in nonparametric estimation. Sufficient conditions on network architectures are provided such that the upper bounds become optimal (without log-sacrifice). Our proof relies on an explicitly constructed network estimator based on tensor product B-splines. We also derive asymptotic distributions for the constructed network and a relating hypothesis testing procedure. The testing procedure is further proved as minimax optimal under suitable network architectures.Item Recycled two-stage estimation in nonlinear mixed effects regression models(Springer, 2022-09) Zhang, Yue; Boukai, Ben; Mathematical Sciences, School of ScienceWe consider a re-sampling scheme for estimation of the population parameters in the mixed-effects nonlinear regression models of the type used, for example, in clinical pharmacokinetics. We provide a two-stage estimation procedure which resamples (or recycles), via random weightings, the various parameter's estimates to construct consistent estimates of their respective sampling distributions. In particular, we establish under rather general distribution-free assumptions, the asymptotic normality and consistency of the standard two-stage estimates and of their resampled version and demonstrate the applicability of our proposed resampling methodology in a small simulation study. A detailed example based on real clinical pharmacokinetic data is also provided.Item Sample Size Determination for Subsampling in the Analysis of Big Data, Multiplicative Models for Confidence Intervals and Free-Knot Changepoint Models(2024-05) Zhang, Sheng; Peng, Hanxiang; Tan, Fei; Sarkar, Jyoti; Boukai, BenThe dissertation consists of three parts. Motivated by subsampling in the analysis of Big Data and by data-splitting in machine learning, sample size determination for multidimensional parameters is presented in the first part. In the second part, we propose a novel approach to the construction of confidence intervals based on improved concentration inequalities. We provide the missing factor for the tail probability of a random variable which generalizes Talagrand’s (1995) result of the missing factor in Hoeffding’s inequalities. We give the procedure for constructing confidence intervals and illustrate it with simulations. In the third part, we study irregular change-point models using free-knot splines. The consistency and asymptotic normality of the least squares estimators are proved for the irregular models in which the linear spline is not differentiable. Simulations are carried out to explore the numerical properties of the proposed models. The results are used to analyze the US Covid-19 data.Item Subgroup Identification in Clinical Trials(2020-04) Li, Xiaochen; Gao, Sujuan; Shen, Changyu; Boukai, Ben; Zhang, Jianjun; Liu, HaoSubgroup analyses assess the heterogeneity of treatment effects in groups of patients defined by patients’ baseline characteristics. Identifying subgroup of patients with differential treatment effect is crucial for tailored therapeutics and personalized medicine. Model-based variable selection methods are well developed and widely applied to select significant treatment-by-covariate interactions for subgroup analyses. Machine learning and data-driven based methods for subgroup identification have also been developed. In this dissertation, I consider two different types of subgroup identification methods: one is nonparametric machine learning based and the other is model based. In the first part, the problem of subgroup identification was transferred to an optimization problem and a stochastic search technique was implemented to partition the whole population into disjoint subgroups with differential treatment effect. In the second approach, an integrative three-step model-based variable selection method was proposed for subgroup analyses in longitudinal data. Using this three steps variable selection framework, informative features and their interaction with the treatment indicator can be identified for subgroup analysis in longitudinal data. This method can be extended to longitudinal binary or categorical data. Simulation studies and real data examples were used to demonstrate the performance of the proposed methods.Item A Switching Regressions Framework for Models with Count-Valued Omni-Dispersed Outcomes: Specification, Estimation and Causal Inference(2020-02) Manalew, Wondimu Samuel; Terza, Joseph V.; Boukai, Ben; Osili, Una; Tennekoon, Vidhura; Trombley, MattIn this dissertation, I develop a regression-based approach to the specification and estimation of the effect of a presumed causal variable on a count-valued outcome of interest. Statistics for relevant causal inference are also derived. As an illustration and as a basis for comparing alternative parametric specifications with respect to ease of implementation, computational efficiency and statistical performance, the proposed models and estimation methods are used to analyze household fertility decisions. I estimate the effect of a counterfactually imposed additional year of wife’s education on actual family size (AFS) and desired family size (DFS) [count-valued variables]. In order to ensure the causal interpretability of the effect parameter as I define it, the underlying regression model is cast in a potential outcomes (PO) framework. The specification of the relevant data generating process (DGP) is also derived. The regression-based approach developed in the dissertation, in addition to taking explicit account of the fact that the outcome of interest is count-valued, is designed to account for potential sample selection bias due to a particular data deficiency in the count data context and to accommodate the possibility that some structural aspects of the model may vary with the value of a binary switching variable. Moreover, my approach loosens the equi-dispersion constraint [conditional mean (CM) equals conditional variance (CV)] that plagues conventional (poisson) count-outcome regression models. This is a particularly important feature of my model and method because in most contexts in empirical economics the data are either over-dispersed (CM < CV) or under-dispersed (CM > CV) – fertility models are usually characterized by the latter. Alternative count data models were discussed and compared using simulated and real data. The simulation results and estimation results using real data suggest that the estimated effects from my proposed models (models that loosen the equi-dispersion constraint, account for the sample selection, and accommodate variability in structural aspect of the models due to a switching variable) substantively differ from estimates from a conventional linear and count regression specifications.