IU Indianapolis ScholarWorks :: Browsing by Subject "Machine learning"

Browsing by Subject "Machine learning"

Now showing 1 - 10 of 172

A Bi-Level Data-Driven Framework for Fault-Detection and Diagnosis of HVAC Systems feature explainability
(Elsevier, 2022-07) Movahed, Paria; Taheri, Saman; Razban, Ali; Mechanical Engineering, School of Engineering and Technology
Machine learning methods have lately received considerable interest for fault detection diagnostic (FDD) analysis of heating, ventilation, and air conditioning (HVAC) systems due to their high detection accuracy. Meanwhile, HVAC malfunctions are regarded as rare occurrences, hence normal operating data samples are much more accessible than data samples in faulty and malfunctioning conditions. The dominating frequency of normal operation in HVAC datasets have also led to heavily biased classification algorithms within the literature. Moreover, the focus of previous literature has been on increasing accuracy of the models while this leads to a high number of false positives (misleading alarms) in the system. To enhance the performance of diagnostic procedures and fill the mentioned gaps, this study proposes a novel data-driven framework. A bi-level machine learning framework is developed for diagnosing faults in air handling units and rooftop units based on principal component analysis (PCA), time series anomaly detection, and random forest (RF). It is shown that PCA can reduce the dataset dimension with one principal component accounting for 95% of data variance. Also, the random forest could classify the faults with 89% precision for single zone AHU, 85% precision for RTU, and 79% for multi-zone AHU.
A comprehensive and bias-free machine learning approach for risk prediction of preeclampsia with severe features in a nulliparous study cohort
(Springer Nature, 2024-12-24) Lin, Yun C.; Mallia, Daniel; Clark‑Sevilla, Andrea O.; Catto, Adam; Leshchenko, Alisa; Yan, Qi; Haas, David M.; Wapner, Ronald; Pe’er, Itsik; Raja, Anita; Salleb‑Aouissi, Ansaf; Obstetrics and Gynecology, School of Medicine
Preeclampsia is one of the leading causes of maternal morbidity, with consequences during and after pregnancy. Because of its diverse clinical presentation, preeclampsia is an adverse pregnancy outcome that is uniquely challenging to predict and manage. In this paper, we developed racial bias-free machine learning models that predict the onset of preeclampsia with severe features or eclampsia at discrete time points in a nulliparous pregnant study cohort. To focus on those most at risk, we selected probands with severe PE (sPE). Those with mild preeclampsia, superimposed preeclampsia, and new onset hypertension were excluded.The prospective study cohort to which we applied machine learning is the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be (nuMoM2b) study, which contains information from eight clinical sites across the US. Maternal serum samples were collected for 1,857 individuals between the first and second trimesters. These patients with serum samples collected are selected as the final cohort.Our prediction models achieved an AUROC of 0.72 (95% CI, 0.69-0.76), 0.75 (95% CI, 0.71-0.79), and 0.77 (95% CI, 0.74-0.80), respectively, for the three visits. Our initial models were biased toward non-Hispanic black participants with a high predictive equality ratio of 1.31. We corrected this bias and reduced this ratio to 1.14. This lowers the rate of false positives in our predictive model for the non-Hispanic black participants. The exact cause of the bias is still under investigation, but previous studies have recognized PLGF as a potential bias-inducing factor. However, since our model includes various factors that exhibit a positive correlation with PLGF, such as blood pressure measurements and BMI, we have employed an algorithmic approach to disentangle this bias from the model.The top features of our built model stress the importance of using several tests, particularly for biomarkers (BMI and blood pressure measurements) and ultrasound measurements. Placental analytes (PLGF and Endoglin) were strong predictors for screening for the early onset of preeclampsia with severe features in the first two trimesters.
A framework for graph-base neural network using numerical simulation of metal powder bed fusion for correlating process parameters and defect generation
(Elsevier, 2022) Akter Jahan, Suchana; Al Hasan, Mohammad; El-Mounayri, Hazim; Computer Science, Luddy School of Informatics, Computing, and Engineering
Powder bed fusion (PBF) is the most common technique used for metal additive manufacturing. This process involves consolidation of metal powder using a heat source such as laser or electron beam. During the formation of three-dimensional(3D) objects by sintering metal powders layer by layer, many different thermal phenomena occur that can create defects or anomalies on the final printed part. Similar to other additive manufacturing techniques, PBF has been in practice for decades, yet it is still going through research and development endeavors which is required to understand the physics behind this process. Defects and deformations highly impact the product quality and reliability of the overall manufacturing process; hence, it is essential that we understand the reason and mechanism of defect generation in PBF process and take appropriate measures to rectify them. In this paper, we have attempted to study the effect of processing parameters (scanning speed, laser power) on the generation of defects in PBF process using a graph-based artificial neural network that uses numerical simulation results as input or training data. Use of graph-based machine learning is novel in the area of manufacturing let alone additive manufacturing or powder bed fusion. The outcome of this study provides an opportunity to design a feedback controlled in-situ online monitoring system in powder bed fusion to reduce printing defects and optimize the manufacturing process.
A Fur family protein BosR is a novel RNA-binding protein that controls rpoS RNA stability in the Lyme disease pathogen
(Oxford University Press, 2024) Raghunandanan, Sajith; Priya, Raj; Alanazi, Fuad; Lybecker, Meghan C.; Schlax, Paula Jean; Yang, X. Frank; Microbiology and Immunology, School of Medicine
2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.
A machine learning model for orthodontic extraction/non-extraction decision in a racially and ethnically diverse patient population
(Elsevier, 2023-09) Mason, Taylor; Kelly, Kynnedy M.; Eckert, George; Dean, Jeffrey A.; Dundar, M. Murat; Turkkahraman, Hakan; Orthodontics and Oral Facial Genetics, School of Dentistry
Introduction The purpose of the present study was to create a machine learning (ML) algorithm with the ability to predict the extraction/non-extraction decision in a racially and ethnically diverse sample. Methods Data was gathered from the records of 393 patients (200 non-extraction and 193 extraction) from a racially and ethnically diverse population. Four ML models (logistic regression [LR], random forest [RF], support vector machine [SVM], and neural network [NN]) were trained on a training set (70% of samples) and then tested on the remaining samples (30%). The accuracy and precision of the ML model predictions were calculated using the area under the curve (AUC) of the receiver operating characteristics (ROC) curve. The proportion of correct extraction/non-extraction decisions was also calculated. Results The LR, SVM, and NN models performed best, with an AUC of the ROC of 91.0%, 92.5%, and 92.3%, respectively. The overall proportion of correct decisions was 82%, 76%, 83%, and 81% for the LR, RF, SVM, and NN models, respectively. The features found to be most helpful to the ML algorithms in making their decisions were maxillary crowding/spacing, L1-NB (mm), U1-NA (mm), PFH:AFH, and SN-MP(̊), although many other features contributed significantly. Conclusions ML models can predict the extraction decision in a racially and ethnically diverse patient population with a high degree of accuracy and precision. Crowding, sagittal, and vertical characteristics all featured prominently in the hierarchy of components most influential to the ML decision-making process.
A Novel Machine Learning Model for Predicting Orthodontic Treatment Duration
(MDPI, 2023-08-23) Volovic, James; Badirl, Sarkhan; Ahmad, Sunna; Leavit, Landon; Mason, Taylor; Bhamidipalli, Surya Sruthi; Eckert, George; Albright, David; Turkkahraman, Hakan; Orthodontics and Oral Facial Genetics, School of Dentistry
In the field of orthodontics, providing patients with accurate treatment time estimates is of utmost importance. As orthodontic practices continue to evolve and embrace new advancements, incorporating machine learning (ML) methods becomes increasingly valuable in improving orthodontic diagnosis and treatment planning. This study aimed to develop a novel ML model capable of predicting the orthodontic treatment duration based on essential pre-treatment variables. Patients who completed comprehensive orthodontic treatment at the Indiana University School of Dentistry were included in this retrospective study. Fifty-seven pre-treatment variables were collected and used to train and test nine different ML models. The performance of each model was assessed using descriptive statistics, intraclass correlation coefficients, and one-way analysis of variance tests. Random Forest, Lasso, and Elastic Net were found to be the most accurate, with a mean absolute error of 7.27 months in predicting treatment duration. Extraction decision, COVID, intermaxillary relationship, lower incisor position, and additional appliances were identified as important predictors of treatment duration. Overall, this study demonstrates the potential of ML in predicting orthodontic treatment duration using pre-treatment variables.
A Typology of Social Media Use by Human Service Nonprofits: Mixed Methods Study
(JMIR, 2024-05-08) Xue, Jia; Shier, Michael L.; Chen, Junxiang; Wang, Yirun; Zheng, Chengda; Chen, Chen; Biostatistics and Health Data Science, Richard M. Fairbanks School of Public Health
Background: Nonprofit organizations are increasingly using social media to improve their communication strategies with the broader population. However, within the domain of human service nonprofits, there is hesitancy to fully use social media tools, and there is limited scope among organizational personnel in applying their potential beyond self-promotion and service advertisement. There is a pressing need for greater conceptual clarity to support education and training on the varied reasons for using social media to increase organizational outcomes. Objective: This study leverages the potential of Twitter (subsequently rebranded as X [X Corp]) to examine the online communication content within a sample (n=133) of nonprofit sexual assault (SA) centers in Canada. To achieve this, we developed a typology using a qualitative and supervised machine learning model for the automatic classification of tweets posted by these centers. Methods: Using a mixed methods approach that combines machine learning and qualitative analysis, we manually coded 10,809 tweets from 133 SA centers in Canada, spanning the period from March 2009 to March 2023. These manually labeled tweets were used as the training data set for the supervised machine learning process, which allowed us to classify 286,551 organizational tweets. The classification model based on supervised machine learning yielded satisfactory results, prompting the use of unsupervised machine learning to classify the topics within each thematic category and identify latent topics. The qualitative thematic analysis, in combination with topic modeling, provided a contextual understanding of each theme. Sentiment analysis was conducted to reveal the emotions conveyed in the tweets. We conducted validation of the model with 2 independent data sets. Results: Manual annotation of 10,809 tweets identified seven thematic categories: (1) community engagement, (2) organization administration, (3) public awareness, (4) political advocacy, (5) support for others, (6) partnerships, and (7) appreciation. Organization administration was the most frequent segment, and political advocacy and partnerships were the smallest segments. The supervised machine learning model achieved an accuracy of 63.4% in classifying tweets. The sentiment analysis revealed a prevalence of neutral sentiment across all categories. The emotion analysis indicated that fear was predominant, whereas joy was associated with the partnership and appreciation tweets. Topic modeling identified distinct themes within each category, providing valuable insights into the prevalent discussions surrounding SA and related issues. Conclusions: This research contributes an original theoretical model that sheds light on how human service nonprofits use social media to achieve their online organizational communication objectives across 7 thematic categories. The study advances our comprehension of social media use by nonprofits, presenting a comprehensive typology that captures the diverse communication objectives and contents of these organizations, which provide content to expand training and education for nonprofit leaders to connect and engage with the public, policy experts, other organizations, and potential service users.
Adaptive Identification of Cortical and Subcortical Imaging Markers of Early Life Stress and Posttraumatic Stress Disorder
(Wiley, 2019-05) Salminen, Lauren E.; Morey, Rajendra A.; Riedel, Brandalyn C.; Jahanshad, Neda; Dennis, Emily L.; Thompson, Paul M.; Radiology and Imaging Sciences, School of Medicine
Posttraumatic stress disorder (PTSD) is a heterogeneous condition associated with a range of brain imaging abnormalities. Early life stress (ELS) contributes to this heterogeneity, but we do not know how a history of ELS influences traditionally defined brain signatures of PTSD. Here, we used a novel machine learning method – evolving partitions to improve classification (EPIC) – to identify shared and unique structural neuroimaging markers of ELS and PTSD in 97 combat-exposed military veterans. METHODS: We used EPIC with repeated cross-validation (CV) to determine how combinations of cortical thickness, surface area, and subcortical brain volumes could contribute to classification of PTSD (n = 40) versus controls (n = 57), and classification of ELS within the PTSD (ELS+ n = 16; ELS− n = 24) and control groups (ELS+ n = 16; ELS− n = 41). Additional inputs included intracranial volume, age, sex, adult trauma, and depression. RESULTS: On average, EPIC classified PTSD with 69% accuracy (SD = 5%), and ELS with 64% accuracy in the PTSD group (SD = 10%), and 62% accuracy in controls (SD = 6%). EPIC selected unique sets of individual features that classified each group with 75–85% accuracy in post hoc analyses; combinations of regions marginally improved classification from the individual atlas-defined brain regions. Across analyses, surface area in the right posterior cingulate was the only variable that was repeatedly selected as an important feature for classification of PTSD and ELS. CONCLUSIONS: EPIC revealed unique patterns of features that distinguished PTSD and ELS in this sample of combat-exposed military veterans, which may represent distinct biotypes of stress-related neuropathology.
AI in Medical Imaging Informatics: Current Challenges and Future Directions
(IEEE, 2020-07) Panayides, Andreas S.; Amini, Amir; Filipovic, Nenad D.; Sharma, Ashish; Tsaftaris, Sotirios A.; Young, Alistair; Foran, David; Do, Nhan; Golemati, Spyretta; Kurc, Tahsin; Huang, Kun; Nikita, Konstantina S.; Veasey, Ben P.; Zervakis, Michalis; Saltz, Joel H.; Pattichis, Constantinos S.; Biostatistics & Health Data Science, School of Medicine
This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine.
Ancestry May Confound Genetic Machine Learning: Candidate-Gene Prediction of Opioid Use Disorder as an Example
(Elsevier, 2021) Hatoum, Alexander S.; Wendt, Frank R.; Galimberti, Marco; Polimanti, Renato; Neale, Benjamin; Kranzler, Henry R.; Gelernter, Joel; Edenberg, Howard J.; Agrawal, Arpana; Medical and Molecular Genetics, School of Medicine
Background: Machine learning (ML) models are beginning to proliferate in psychiatry, however machine learning models in psychiatric genetics have not always accounted for ancestry. Using an empirical example of a proposed genetic test for OUD, and exploring a similar test for tobacco dependence and a simulated binary phenotype, we show that genetic prediction using ML is vulnerable to ancestral confounding. Methods: We utilize five ML algorithms trained with 16 brain reward-derived "candidate" SNPs proposed for commercial use and examine their ability to predict OUD vs. ancestry in an out-of-sample test set (N = 1000, stratified into equal groups of n = 250 cases and controls each of European and African ancestry). We rerun analyses with 8 random sets of allele-frequency matched SNPs. We contrast findings with 11 genome-wide significant variants for tobacco smoking. To document generalizability, we generate and test a random phenotype. Results: None of the 5 ML algorithms predict OUD better than chance when ancestry was balanced but were confounded with ancestry in an out-of-sample test. In addition, the algorithms preferentially predicted admixed subpopulations. Random sets of variants matched to the candidate SNPs by allele frequency produced similar bias. Genome-wide significant tobacco smoking variants were also confounded by ancestry. Finally, random SNPs predicting a random simulated phenotype show that the bias attributable to ancestral confounding could impact any ML-based genetic prediction. Conclusions: Researchers and clinicians are encouraged to be skeptical of claims of high prediction accuracy from ML-derived genetic algorithms for polygenic traits like addiction, particularly when using candidate variants.

Browsing by Subject "Machine learning"

Results Per Page

Sort Options