- Browse by Subject
Browsing by Subject "data mining"
Now showing 1 - 10 of 15
Results Per Page
Sort Options
Item Analyzing the symptoms in colorectal and breast cancer patients with or without type 2 diabetes using EHR data(Sage, 2021) Luo, Xiao; Storey, Susan; Gandhi, Priyanka; Zhang, Zuoyi; Metzger, Megan; Huang, Kun; Computer Information and Graphics Technology, School of Engineering and TechnologyThis research extracted patient-reported symptoms from free-text EHR notes of colorectal and breast cancer patients and studied the correlation of the symptoms with comorbid type 2 diabetes, race, and smoking status. An NLP framework was developed first to use UMLS MetaMap to extract all symptom terms from the 366,398 EHR clinical notes of 1694 colorectal cancer (CRC) patients and 3458 breast cancer (BC) patients. Semantic analysis and clustering algorithms were then developed to categorize all the relevant symptoms into eight symptom clusters defined by seed terms. After all the relevant symptoms were extracted from the EHR clinical notes, the frequency of the symptoms reported from colorectal cancer (CRC) and breast cancer (BC) patients over three time-periods post-chemotherapy was calculated. Logistic regression (LR) was performed with each symptom cluster as the response variable while controlling for diabetes, race, and smoking status. The results show that the CRC and BC patients with Type 2 Diabetes (T2D) were more likely to report symptoms than CRC and BC without T2D over three time-periods in the cancer trajectory. We also found that current smokers were more likely to report anxiety (CRC, BC), neuropathic symptoms (CRC, BC), anxiety (BC), and depression (BC) than non-smokers.Item Comparison of Multi-Sample Variant Calling Methods for Whole Genome Sequencing(Institute of Electrical and Electronics Engineers, 2014-10) Nho, Kwangsik; West, John D.; Li, Huian; Henschel, Robert; Bharthur, Apoorva; Tavares, Michel C.; Saykin, Andrew J.; Department of Medicine, IU School of MedicineRapid advancement of next-generation sequencing (NGS) technologies has facilitated the search for genetic susceptibility factors that influence disease risk in the field of human genetics. In particular whole genome sequencing (WGS) has been used to obtain the most comprehensive genetic variation of an individual and perform detailed evaluation of all genetic variation. To this end, sophisticated methods to accurately call high-quality variants and genotypes simultaneously on a cohort of individuals from raw sequence data are required. On chromosome 22 of 818 WGS data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), which is the largest WGS related to a single disease, we compared two multi-sample variant calling methods for the detection of single nucleotide variants (SNVs) and short insertions and deletions (indels) in WGS: (1) reduce the analysis-ready reads (BAM) file to a manageable size by keeping only essential information for variant calling ("REDUCE") and (2) call variants individually on each sample and then perform a joint genotyping analysis of the variant files produced for all samples in a cohort ("JOINT"). JOINT identified 515,210 SNVs and 60,042 indels, while REDUCE identified 358,303 SNVs and 52,855 indels. JOINT identified many more SNVs and indels compared to REDUCE. Both methods had concordance rate of 99.60% for SNVs and 99.06% for indels. For SNVs, evaluation with HumanOmni 2.5M genotyping arrays revealed a concordance rate of 99.68% for JOINT and 99.50% for REDUCE. REDUCE needed more computational time and memory compared to JOINT. Our findings indicate that the multi-sample variant calling method using the JOINT process is a promising strategy for the variant detection, which should facilitate our understanding of the underlying pathogenesis of human diseases.Item Decision Support from Local Data: Creating Adaptive Order Menus from Past Clinician Behavior(Elsevier, 2014-04) Klann, Jeffrey G.; Szolovits, Peter; Downs, Stephen; Schadow, Gunther; Department of Pediatrics, IU School of MedicineObjective Reducing care variability through guidelines has significantly benefited patients. Nonetheless, guideline-based clinical decision support (CDS) systems are not widely implemented or used, are frequently out-of-date, and cannot address complex care for which guidelines do not exist. Here, we develop and evaluate a complementary approach - using Bayesian network (BN) learning to generate adaptive, context-specific treatment menus based on local order-entry data. These menus can be used as a draft for expert review, in order to minimize development time for local decision support content. This is in keeping with the vision outlined in the US Health Information Technology Strategic Plan, which describes a healthcare system that learns from itself. Materials and Methods We used the Greedy Equivalence Search algorithm to learn four 50-node domain-specific BNs from 11,344 encounters: abdominal pain in the emergency department, inpatient pregnancy, hypertension in the urgent visit clinic, and altered mental state in the intensive care unit. We developed a system to produce situation-specific, rank-ordered treatment menus from these networks. We evaluated this system with a hospital-simulation methodology and computed Area Under the Receiver-Operator Curve (AUC) and average menu position at time of selection. We also compared this system with a similar association-rule-mining approach. Results A short order menu on average contained the next order (weighted average length 3.91–5.83 items). Overall predictive ability was good: average AUC above 0.9 for 25% of order types and overall average AUC .714–.844 (depending on domain). However, AUC had high variance (.50–.99). Higher AUC correlated with tighter clusters and more connections in the graphs, indicating importance of appropriate contextual data. Comparison with an association rule mining approach showed similar performance for only the most common orders with dramatic divergence as orders are less frequent. Discussion and Conclusion This study demonstrates that local clinical knowledge can be extracted from treatment data for decision support. This approach is appealing because: it reflects local standards; it uses data already being captured; and it produces human-readable treatment-diagnosis networks that could be curated by a human expert to reduce workload in developing localized CDS content. The BN methodology captured transitive associations and co-varying relationships, which existing approaches do not. It also performs better as orders become less frequent and require more context. This system is a step forward in harnessing local, empirical data to enhance decision support.Item Deep Learning Based Crop Row Detection(2022-05) Doha, Rashed Mohammad; Anwar, Sohel; Al Hasan, Mohammad; Li, LingxiDetecting crop rows from video frames in real time is a fundamental challenge in the field of precision agriculture. Deep learning based semantic segmentation method, namely U-net, although successful in many tasks related to precision agriculture, performs poorly for solving this task. The reasons include paucity of large scale labeled datasets in this domain, diversity in crops, and the diversity of appearance of the same crops at various stages of their growth. In this work, we discuss the development of a practical real-life crop row detection system in collaboration with an agricultural sprayer company. Our proposed method takes the output of semantic segmentation using U-net, and then apply a clustering based probabilistic temporal calibration which can adapt to different fields and crops without the need for retraining the network. Experimental results validate that our method can be used for both refining the results of the U-net to reduce errors and also for frame interpolation of the input video stream. Upon the availability of more labeled data, we switched our approach from a semi-supervised model to a fully supervised end-to-end crop row detection model using a Feature Pyramid Network or FPN. Central to the FPN is a pyramid pooling module that extracts features from the input image at multiple resolutions. This results in the network’s ability to use both local and global features in classifying pixels to be crop rows. After training the FPN on the labeled dataset, our method obtained a mean IoU or Jaccard Index score of over 70% as reported on the test set. We trained our method on only a subset of the corn dataset and tested its performance on multiple variations of weed pressure and crop growth stages to verify that the performance does translate over the variations and is consistent across the entire dataset.Item Detecting Vehicle Interactions in Driving Videos via Motion Profiles(IEEE, 2020-09) Wang, Zheyuan; Zheng, Jiang Yu; Gao, Zhen; Electrical and Computer Engineering, School of Engineering and TechnologyIdentifying interactions of vehicles on the road is important for accident analysis and driving behavior assessment. Our interactions include those with passing/passed, cut-in, crossing, frontal, on-coming, parallel driving vehicles, and ego-vehicle actions to change lane, stop, turn, and speeding. We use visual motion recorded in driving video taken by a dashboard camera to identify such interaction. Motion profiles from videos are filtered at critical positions, which reduces the complexity from object detection, depth sensing, target tracking, and motion estimation. The results are obtained efficiently, and the accuracy is also acceptable. The results can be used in driving video mining, traffic analysis, driver behavior understanding, etc.Item Does Bad News Spread Faster?(IEEE, 2017-01) Fang, Anna; Ben-Miled, Zina; Electrical and Computer Engineering, School of Engineering and TechnologyBad news travels fast. Although this concept may be intuitively accepted, there has been little evidence to confirm that the propagation of bad news differs from that of good news. In this paper, we examine the effect of user perspective on his or her sharing of a controversial news story. Social media not only offers insight into human behavior but has also developed as a source of news. In this paper, we define the spreading of news by tracking selected tweets in Twitter as they are shared over time to create models of user sharing behavior. Many news events can be viewed as positive or negative. In this paper, we compare and contrast tweets about these news events among general users, while monitoring the tweet frequency for each event over time to ensure that news events are comparable with respect to user interest. In addition, we track the tweets of a controversial event between two different groups of users (i.e., those who view the event as positive and those who view it as negative). As a result, we are able to make assessments based on a single event from two different perspectives.Item FS3: A Sampling based method for top-k Frequent Subgraph Mining(2015) Saha, Tanay Kumar; Al Hasan, Mohammad; Department of Computer & Information Science, School of ScienceMining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size.Item Improving information retrieval from electronic health records using dynamic and multi-collaborative filtering(IEEE, 2019) Fan, Ziwei; Burgun, Evan; Ren, Zhiyun; Schleyer, Titus; Ning, Xia; Medicine, School of MedicineDue to the rapid growth of information available about individual patients, most physicians suffer from information overload when they review patient information in health information technology systems. In this manuscript, we present a novel hybrid dynamic and multi-collaborative filtering method to improve information retrieval from electronic health records. This method recommends relevant information from electronic health records for physicians during patient visits. It models information search dynamics using a Markov model. It also leverages the key idea of collaborative filtering, originating from Recommender Systems, to prioritize information based on various similarities among physicians, patients and information items We tested this new method using real electronic health record data from the Indiana Network for Patient Care. Our experimental results demonstrated that for 46.7% of test cases, this new method is able to correctly prioritize relevant information among top-5 recommendations that physicians are truly interested in.Item Learning Analytics and the Academic Library: Professional Ethics Commitments at a Crossroads(ACRL, 2018) Jones, Kyle M. L.; Library and Information Science, School of Informatics and ComputingIn this paper, the authors address learning analytics and the ways academic libraries are beginning to participate in wider institutional learning analytics initiatives. Since there are moral issues associated with learning analytics, the authors consider how data mining practices run counter to ethical principles in the American Library Association’s “Code of Ethics.” Specifically, the authors address how learning analytics implicates professional commitments to promote intellectual freedom; protect patron privacy and confidentiality; and balance intellectual property interests between library users, their institution, and content creators and vendors. The authors recommend that librarians should embed their ethical positions in technological designs, practices, and governance mechanisms.Item NOUS: Construction and Querying of Dynamic Knowledge Graphs(IEEE, 2017-04) Choudhury, Sutanay; Agarwal, Khushbu; Purohit, Sumit; Zhang, Baichuan; Pirrung, Meg; Smith, Will; Thomas, Mathew; Computer and Information Science, School of ScienceThe ability to construct domain specific knowledge graphs (KG) and perform question-answering or hypothesis generation is a transformative capability. Despite their value, automated construction of knowledge graphs remains an expensive technical challenge that is beyond the reach for most enterprises and academic institutions. We propose an end-toend framework for developing custom knowledge graph driven analytics for arbitrary application domains. The uniqueness of our system lies A) in its combination of curated KGs along with knowledge extracted from unstructured text, B) support for advanced trending and explanatory questions on a dynamic KG, and C) the ability to answer queries where the answer is embedded across multiple data sources.