GRAPH BASED MINING ON WEIGHTED DIRECTED GRAPHS FOR SUBNETWORKS AND PATH DISCOVERY

Abdulkarim, Sijin Cherupilly

GRAPH BASED MINING ON WEIGHTED DIRECTED GRAPHS FOR SUBNETWORKS AND PATH DISCOVERY

Files

Thesis.pdf (3.58 MB)

Date

2011-08-16

Authors

Abdulkarim, Sijin Cherupilly

Language

American English

Committee Chair

Palakal, Mathew J.

Committee Members

Fang, Shiaofen
Xia, Yuni

Degree

M.S.

Degree Year

2011

Department

Computer & Information Science

Grantor

Purdue University

Abstract

Subnetwork or path mining is an emerging data mining problem in many areas including scientific and commercial applications. Graph modeling is one of the effective ways in representing real world networks. Many natural and man-made systems are structured in the form of networks. Traditional machine learning and data mining approaches assume data as a collection of homogenous objects that are independent of each other whereas network data are potentially heterogeneous and interlinked. In this paper we propose a novel algorithm to find subnetworks and Maximal paths from a weighted, directed network represented as a graph. The main objective of this study is to find meaningful Maximal paths from a given network based on three key parameters: node weight, edge weight, and direction. This algorithm is an effective way to extract Maximal paths from a network modeled based on a user’s interest. Also, the proposed algorithm allows the user to incorporate weights to the nodes and edges of a biological network. The performance of the proposed technique was tested using a Colorectal Cancer biological network. The subnetworks and paths obtained through our network mining algorithm from the biological network were scored based on their biological significance. The subnetworks and Maximal paths derived were verified using MetacoreTM as well as literature. The algorithm is developed into a tool where the user can input the node list and the edge list. The tool can also find out the upstream and downstream of a given entity (genes/proteins etc.) from the derived Maximal paths. The complexity of finding the algorithm is found to be O(nlogn) in the best case and O(n^2 logn) in the worst case.

Description

Indiana University-Purdue University Indianapolis (IUPUI)

Keywords

Computer Science, Bioinformatics

LC Subjects

Data mining, Directed graphs

Rights

Permanent Link

https://hdl.handle.net/1805/2618
http://dx.doi.org/10.7912/C2/2287

Collections

Computer & Information Science Department Theses and Dissertations

Full item page