IU Indianapolis ScholarWorks :: Browsing by Subject "big data"

Browsing by Subject "big data"

Now showing 1 - 10 of 22

The best of times and the worst of times: empirical operations and supply chain management research
(Taylor & Francis, 2017) Melnyk, Steven A.; Flynn, Barbara B.; Awaysheh, Amrou; Occupational Therapy, School of Health and Rehabilitation Sciences
We assess the current state of empirical research in operations and supply chain management (OSM), using Dickens’ contrast between the best of times and the worst of times as a frame. The best of times refers to the future that empirical OSM research is now entering, with exciting opportunities available using big data and other new data sources, new empirical approaches and analytical techniques and innovative tools for developing theory. These are well aligned with new research questions related to the digital economy, Industry 4.0, the impact of the millennial generation as consumers, social media, 3D printing, etc. However, we also explore how it is the worst of times, focusing on the challenges and problems that plague empirical OSM research. Our goal is to show how OSM researchers can learn from the worst of times, in order to be poised to take advantage of the best of times. We introduce the research diamond as a vehicle for emphasising the importance of a balanced research perspective that treats the research problem, theory, data collection and data analysis as equally important, requiring alignment between them. By learning and addressing the issues in this period of the best of times and the worst of times, we can take advantage of the opportunities facing our field to generate research that is balanced, insightful, rigorous, relevant, impactful and interesting.
Big Data and Causal Inference: What Does a New Analysis of the UK BioBank Data Tell Us?
(AHA, 2020) Tu, Wanzhu; Pratt, J. Howard; Biostatistics, School of Public Health
Big Data and Dysmenorrhea: What Questions Do Women and Men Ask About Menstrual Pain?
(Liebert, 2018-10) Chen, Chen X.; Groves, Doyle; Miller, Wendy R.; Carpenter, Janet S.; School of Nursing
Background: Menstrual pain is highly prevalent among women of reproductive age. As the general public increasingly obtains health information online, Big Data from online platforms provide novel sources to understand the public's perspectives and information needs about menstrual pain. The study's purpose was to describe salient queries about dysmenorrhea using Big Data from a question and answer platform. Materials and Methods: We performed text-mining of 1.9 billion queries from ChaCha, a United States-based question and answer platform. Dysmenorrhea-related queries were identified by using keyword searching. Each relevant query was split into token words (i.e., meaningful words or phrases) and stop words (i.e., not meaningful functional words). Word Adjacency Graph (WAG) modeling was used to detect clusters of queries and visualize the range of dysmenorrhea-related topics. We constructed two WAG models respectively from queries by women of reproductive age and bymen. Salient themes were identified through inspecting clusters of WAG models. Results: We identified two subsets of queries: Subset 1 contained 507,327 queries from women aged 13–50 years. Subset 2 contained 113,888 queries from men aged 13 or above. WAG modeling revealed topic clusters for each subset. Between female and male subsets, topic clusters overlapped on dysmenorrhea symptoms and management. Among female queries, there were distinctive topics on approaching menstrual pain at school and menstrual pain-related conditions; while among male queries, there was a distinctive cluster of queries on menstrual pain from male's perspectives. Conclusions: Big Data mining of the ChaCha® question and answer service revealed a series of information needs among women and men on menstrual pain. Findings may be useful in structuring the content and informing the delivery platform for educational interventions.
Big Data Curation Framework: Curation Actions and Challenges
(Sage, 2022) Yoon, Ayoung; Kim, Jihyun; Donaldson, Devan Ray; Library and Information Science, School of Computing and Informatics
Big data curation represents an emerging topic of inquiry but still in an early phase along its adoption curve. The term big data itself is a nebulous concept, and the differences between small data curation and big data curation are nuanced. The goal of this research is to provide a theoretical framework that identifies big data curation actions and associated curation challenges. This study is based on the practices of big data research and data curation by systematically examining literature. The outcome of the study includes the big data curation framework that provides overview of curation activities and concerns that are essential to perform such activities. The study also provides practical implications for libraries, archives, data repositories and other information organisations that concerns the issue of big data curation as big data presents a multidimensional array of exigencies in relation to the mission of those organisations.
Big data in nephrology-a time to rethink
(Oxford, 2018) Agarwal, Rajiv; Sinha, Arjun D.; Medicine, School of Medicine
Big Data Proxies and Health Privacy Exceptionalism
(2014) Terry, Nicolas P.; Robert H. McKinney School of Law
This article argues that, while “small data” rules protect conventional health care data (doing so exceptionally, if not exceptionally well), big data facilitates the creation of health data proxies that are relatively unprotected. As a result, the carefully constructed, appropriate, and necessary model of health data privacy will be eroded. Proxy data created outside the traditional space protected by extant health privacy models will end exceptionalism, reducing data protection to the very low levels applied to most other types of data. The article examines big data and its relationship with health care, including the data pools in play, and pays particular attention to three types of big data that lead to health proxies: “laundered” HIPAA data, patient-curated data, and medically-inflected data. It then reexamines health privacy exceptionalism across legislative and regulatory domains seeking to understand its level of “stickiness” when faced with big data. Finally the article examines how health privacy exceptionalism maps to the currently accepted rationales for health privacy and discusses the relative strengths of upstream and downstream data models in curbing what is viewed as big data’s assault of health privacy.
Big data researchers’ perceived value of big data curation
(2023) Yoon, Ayoung
This study aims to understand the value of big data curation in a professional context. Researchers' understanding of big data curation is critical to promptly preparing data for future use and curating professionals preparing. The literature analysis suggests that big data researchers acknowledge the value of curation in staying abreast of technology and data quality, but social aspects (e.g., legal and ethical issues) are less recognized.
Developing Automated Computer Algorithms to Track Periodontal Disease Change from Longitudinal Electronic Dental Records
(MDPI, 2023-03-08) Patel, Jay S.; Kumar, Krishna; Zai, Ahad; Shin, Daniel; Willis, Lisa; Thyvalikakath, Thankam P.
Objective: To develop two automated computer algorithms to extract information from clinical notes, and to generate three cohorts of patients (disease improvement, disease progression, and no disease change) to track periodontal disease (PD) change over time using longitudinal electronic dental records (EDR). Methods: We conducted a retrospective study of 28,908 patients who received a comprehensive oral evaluation between 1 January 2009, and 31 December 2014, at Indiana University School of Dentistry (IUSD) clinics. We utilized various Python libraries, such as Pandas, TensorFlow, and PyTorch, and a natural language tool kit to develop and test computer algorithms. We tested the performance through a manual review process by generating a confusion matrix. We calculated precision, recall, sensitivity, specificity, and accuracy to evaluate the performances of the algorithms. Finally, we evaluated the density of longitudinal EDR data for the following follow-up times: (1) None; (2) Up to 5 years; (3) > 5 and ≤ 10 years; and (4) >10 and ≤ 15 years. Results: Thirty-four percent (n = 9954) of the study cohort had up to five years of follow-up visits, with an average of 2.78 visits with periodontal charting information. For clinician-documented diagnoses from clinical notes, 42% of patients (n = 5562) had at least two PD diagnoses to determine their disease change. In this cohort, with clinician-documented diagnoses, 72% percent of patients (n = 3919) did not have a disease status change between their first and last visits, 669 (13%) patients’ disease status progressed, and 589 (11%) patients’ disease improved. Conclusions: This study demonstrated the feasibility of utilizing longitudinal EDR data to track disease changes over 15 years during the observation study period. We provided detailed steps and computer algorithms to clean and preprocess the EDR data and generated three cohorts of patients. This information can now be utilized for studying clinical courses using artificial intelligence and machine learning methods.
Existential challenges for healthcare data protection in the United States
(2017-01) Terry, Nicolas P.
There are increasing threats to healthcare data protection in the United States. Most federal data privacy laws apply only to specific sectors, such as healthcare, education, communications, or financial services. In the absence of comprehensive data protection legislation there are multiple, sectoral approaches. These privacy laws are noticeably limited in their vertical scope, preferring downstream protections such as confidentiality, security, and breach notification. Hardly any US laws contain upstream requirements that minimize or otherwise limit data collection. The imminent “EU General Data Protection Regulation” (GDPR) is considerably more comprehensive. Horizontally, it applies to all sectors of the economy, all broadly defined “personal data,” and all who control or process data. Vertically, it applies protective standards throughout the lifespan of data. In the US, the primary federal law applying to healthcare data comprises of regulations known as the “HIPAA Privacy and Security Rules.” The HIPAA rules provide considerably weaker protection than the GDPR, although they are far stronger that the protections applicable to other commercial sectors in the US HIPAA has relatively narrow scope, essentially only applying to data held by traditional healthcare providers and applying only downstream protections; confidentiality, security, and breach notification. Notwithstanding its weaknesses, the HIPAA rules are quite detailed and generally well enforced. Thus, HIPAA has created expectations in patients that all their healthcare data are safe. This is no longer the case, either within the HIPAA “zone” or outside of it. First, traditional providers have almost completed their transition from paper to electronic health records, during which they swap the protections inherent in unconnected file rooms for far riskier computerized longitudinal databases. Second, multiple parties outside of healthcare view healthcare data by as having great value; “big data” brokers collect healthcare data or medically-inflected data for their predictive analytics products, while cybercriminals long since have recognized the profit in stealing health records. Third, consumer electronics companies continue to disrupt healthcare data markets (and data protection) by encouraging consumers to themselves collect and curate data from mobile health apps, wearable devices and the “internet of things.” These challenges to healthcare data protection highlight the fundamental flaws of domain-limited protections and over-reliance on a limited set of protective models. The former because disruptive businesses and technological innovations can make a nonsense of narrowly-defined sectoral protections. The latter because policymakers need a broader array of tools to combat modern challenges while reliance on downstream models intrinsically concedes the correctness of unregulated data collection. The outlook for US healthcare data protection is increasingly bleak. In the aftermath of the 2016 US election, it is quite likely that HIPAA rules will be enforced with less enthusiasm, encouraging an increase in data leaks from the health care system. Further, those victorious in the election are no friends of pro-privacy regulatory agencies and some of their data protection activities may be reined in. It is also extremely unlikely that comprehensive privacy legislation will be passed by the incoming administration. Yet, technological progress and consumer choice almost inevitably will result in increasing amounts of healthcare data being created and processed outside the HIPAA-protected zone. Not surprisingly therefore, healthcare data protection in the US faces a perilous future and one that increasingly will be at odds with the protections offered by its trading partners.
Finding the Patient’s Voice Using Big Data: Analysis of Users’ Health-Related Concerns in the ChaCha Question-and-Answer Service (2009–2012)
(JMIR, 2016) Priest, Chad; Knopf, Amelia; Groves, Doyle; Carpenter, Janet S.; Furrey, Christopher; Krishnan, Anand; Miller, Wendy R.; Otte, Julie L.; Palakal, Mathew; Wiehe, Sarah E.; Wilson, Jeffrey S.; IU School of Nursing
Background: The development of effective health care and public health interventions requires a comprehensive understanding of the perceptions, concerns, and stated needs of health care consumers and the public at large. Big datasets from social media and question-and-answer services provide insight into the public’s health concerns and priorities without the financial, temporal, and spatial encumbrances of more traditional community-engagement methods and may prove a useful starting point for public-engagement health research (infodemiology). Objective: The objective of our study was to describe user characteristics and health-related queries of the ChaCha question-and-answer platform, and discuss how these data may be used to better understand the perceptions, concerns, and stated needs of health care consumers and the public at large. Methods: We conducted a retrospective automated textual analysis of anonymous user-generated queries submitted to ChaCha between January 2009 and November 2012. A total of 2.004 billion queries were read, of which 3.50% (70,083,796/2,004,243,249) were missing 1 or more data fields, leaving 1.934 billion complete lines of data for these analyses. Results: Males and females submitted roughly equal numbers of health queries, but content differed by sex. Questions from females predominantly focused on pregnancy, menstruation, and vaginal health. Questions from males predominantly focused on body image, drug use, and sexuality. Adolescents aged 12–19 years submitted more queries than any other age group. Their queries were largely centered on sexual and reproductive health, and pregnancy in particular. Conclusions: The private nature of the ChaCha service provided a perfect environment for maximum frankness among users, especially among adolescents posing sensitive health questions. Adolescents’ sexual health queries reveal knowledge gaps with serious, lifelong consequences. The nature of questions to the service provides opportunities for rapid understanding of health concerns and may lead to development of more effective tailored interventions. [J Med Internet Res 2016;18(3):e44]

Browsing by Subject "big data"

Results Per Page

Sort Options