A Dynamic, User-centric Big Data Analytics Framework for Genome Data

Ravishankar, Shalini; Pradhan, Meeta; Palakal, Mathew

A Dynamic, User-centric Big Data Analytics Framework for Genome Data

dc.contributor.author	Ravishankar, Shalini
dc.contributor.author	Pradhan, Meeta
dc.contributor.author	Palakal, Mathew
dc.date.accessioned	2016-04-21T19:16:08Z
dc.date.available	2016-04-21T19:16:08Z
dc.date.issued	2015-04-17
dc.description	poster abstract	en_US
dc.description.abstract	The cost to sequence DNA today has reduced from $100million to mere over $1000 and this has significantly increased the generation of genomic data multifold. However, analysis of such large data requires meeting user needs and computational challenges. There are different tools that exist to process the sequenced DNA information for alignment and research. These tools are made adaptive to work in a big data processing environment like Hadoop. However, the analysis of such sequence data is dependent on user specific needs, and hence, a unique data analysis pipeline is needed for each user. We propose a barcode driven technology to instruct a Hadoop-based big data analytics system that would allow the user to select the necessary tools to process the input genome data file. The proposed framework can dynamically generate customized barcodes for each user based on the user’s data analysis need and a pipeline is created and driven by the barcode. This approach will revolutionize the way NGS data analytics pipelines are being setup by the user. This new method will provide the user with a seamless way to analyze the data. The time taken to process a genomic file was significantly reduced from 2 hours on a traditional Linux server to just 3.81 minutes on Hadoop. Our results indicate that a barcode-based approach will enable the user to customize NGS data analysis in a very efficient manner.	en_US
dc.identifier.citation	Shalini Ravishankar, Meeta Pradhan and Mathew Palakal. 2015 April 17. A Dynamic, User-centric Big Data Analytics Framework for Genome Data. Poster session presented at IUPUI Research Day 2015, Indianapolis, Indiana.	en_US
dc.identifier.uri	https://hdl.handle.net/1805/9372
dc.language.iso	en_US	en_US
dc.publisher	Office of the Vice Chancellor for Research	en_US
dc.subject	sequence DNA	en_US
dc.subject	Genome Data	en_US
dc.subject	data processing environment	en_US
dc.subject	Hadoop	en_US
dc.title	A Dynamic, User-centric Big Data Analytics Framework for Genome Data	en_US
dc.type	Poster	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ravishankar-Dynamic.pdf
Size:: 104.61 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.88 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

IUPUI Research Day 2015