Data-To-Question Generation Using Deep Learning

dc.contributor.authorKoshy, Nicole
dc.contributor.authorDixit, Anshuman
dc.contributor.authorJadhav, Siddhi Shrikant
dc.contributor.authorPenmatsa, Arun V.
dc.contributor.authorSamanthapudi, Sagar V.
dc.contributor.authorKumar, Mothi Gowtham Asok
dc.contributor.authorAnuyah, Sydney Oghenetega
dc.contributor.authorVemula, Gourav
dc.contributor.authorHerzog, Patricia Snell
dc.contributor.authorBolchini, Davide
dc.date.accessioned2024-12-23T18:30:05Z
dc.date.available2024-12-23T18:30:05Z
dc.date.issued2023
dc.description.abstractMany publicly available datasets exist that can provide factual answers to a wide range of questions that benefit the public. Indeed, datasets created by governmental and non- governmental organizations often have a mandate to share data with the public. However, these datasets are often underutilized by knowledge workers due to the cumbersome amount of expertise and embedded implicit information needed for everyday users to access, analyze, and utilize their information. To seek solutions to this problem, this paper discusses the design of an automated process for generating questions that provide insight into a dataset. Given a relational dataset, our prototype system architecture follows a five-step process from data extraction, cleaning, pre-processing, entity recognition using deep learning, and questions formulation. Through examples of our results, we show that the questions generated by our approach are similar and, in some cases, more accurate than the ones generated by an AI engine like ChatGPT, whose question outputs while more fluent, are often not true to the facts represented in the original data. We discuss key limitations of our approach and the work to be done to bring to life a fully generalized pipeline that can take any data set and automatically provide the user with factual questions that the data can answer.
dc.identifier.citationKoshy, Nicole R., Anshuman Dixit, Siddhi Shrikant Jadhav, Arun V. Penmatsa, Sagar V. Samanthapudi, Mothi Gowtham Ashok Kumar, Sydney Oghenetega Anuyah, Gourav Vemula, Patricia Snell Herzog, Davide Bolchini. 2023. “Data-to-Question Generation Using Deep Learning.” Proceedings of the Institute of Electrical and Electronics Engineers (IEEE) International Conference on Big Data Analytics and Practices: 78-83. doi: 10.1109/IBDAP58581.2023.10271940.
dc.identifier.urihttps://hdl.handle.net/1805/45171
dc.language.isoen_US
dc.publisherIEEE
dc.relation.isversionof10.1109/IBDAP58581.2023.10271940
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectquestion generation
dc.subjectdata analytics
dc.subjectsemantic typing
dc.subjectmeta categories
dc.subjectSherlock
dc.subjectSpacy
dc.subjectsemantic distance calculation
dc.subjectChatGPT
dc.subjectLDA
dc.subjectdeep learning
dc.subjectknowledge extraction
dc.subjecttopic modeling
dc.subjectpublic good
dc.titleData-To-Question Generation Using Deep Learning
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Koshy et al 2023 - Data-To-Question Generation Using Deep Learning (IBDAP).pdf
Size:
648.53 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: