A Dynamic, User-centric Big Data Analytics Framework for Genome Data
dc.contributor.author | Ravishankar, Shalini | |
dc.contributor.author | Pradhan, Meeta | |
dc.contributor.author | Palakal, Mathew | |
dc.date.accessioned | 2016-04-21T19:16:08Z | |
dc.date.available | 2016-04-21T19:16:08Z | |
dc.date.issued | 2015-04-17 | |
dc.description | poster abstract | en_US |
dc.description.abstract | The cost to sequence DNA today has reduced from $100million to mere over $1000 and this has significantly increased the generation of genomic data multifold. However, analysis of such large data requires meeting user needs and computational challenges. There are different tools that exist to process the sequenced DNA information for alignment and research. These tools are made adaptive to work in a big data processing environment like Hadoop. However, the analysis of such sequence data is dependent on user specific needs, and hence, a unique data analysis pipeline is needed for each user. We propose a barcode driven technology to instruct a Hadoop-based big data analytics system that would allow the user to select the necessary tools to process the input genome data file. The proposed framework can dynamically generate customized barcodes for each user based on the user’s data analysis need and a pipeline is created and driven by the barcode. This approach will revolutionize the way NGS data analytics pipelines are being setup by the user. This new method will provide the user with a seamless way to analyze the data. The time taken to process a genomic file was significantly reduced from 2 hours on a traditional Linux server to just 3.81 minutes on Hadoop. Our results indicate that a barcode-based approach will enable the user to customize NGS data analysis in a very efficient manner. | en_US |
dc.identifier.citation | Shalini Ravishankar, Meeta Pradhan and Mathew Palakal. 2015 April 17. A Dynamic, User-centric Big Data Analytics Framework for Genome Data. Poster session presented at IUPUI Research Day 2015, Indianapolis, Indiana. | en_US |
dc.identifier.uri | https://hdl.handle.net/1805/9372 | |
dc.language.iso | en_US | en_US |
dc.publisher | Office of the Vice Chancellor for Research | en_US |
dc.subject | sequence DNA | en_US |
dc.subject | Genome Data | en_US |
dc.subject | data processing environment | en_US |
dc.subject | Hadoop | en_US |
dc.title | A Dynamic, User-centric Big Data Analytics Framework for Genome Data | en_US |
dc.type | Poster | en_US |