Distributed graph decomposition algorithms on Apache Spark

dc.contributor.advisorHasan, Mohammad Al
dc.contributor.authorMandal, Aritra
dc.contributor.otherMohler, George
dc.contributor.otherSong, Fengguang
dc.date.accessioned2018-08-01T20:52:34Z
dc.date.available2018-08-01T20:52:34Z
dc.date.issued2018-04-20
dc.degree.date2018en_US
dc.degree.grantorPurdue Universityen_US
dc.degree.levelM.S.en_US
dc.descriptionIndiana University-Purdue University Indianapolis (IUPUI)en_US
dc.description.abstractStructural analysis and mining of large and complex graphs for describing the characteristics of a vertex or an edge in the graph have widespread use in graph clustering, classification, and modeling. There are various methods for structural analysis of graphs including the discovery of frequent subgraphs or network motifs, counting triangles or graphlets, spectral analysis of networks using eigenvectors of graph Laplacian, and finding highly connected subgraphs such as cliques and quasi cliques. Unfortunately, the algorithms for solving most of the above tasks are quite costly, which makes them not-scalable to large real-life networks. Two such very popular decompositions, k-core and k-truss of a graph give very useful insight about the graph vertex and edges respectively. These decompositions have been applied to solve protein functions reasoning on protein-protein networks, fraud detection and missing link prediction problems. k-core decomposition with is linear time complexity is scalable to large real-life networks as long as the input graph fits in the main memory. k-truss on the other hands is computationally more intensive due to its definition relying on triangles and their is no linear time algorithm available for it. In this paper, we propose distributed algorithms on Apache Spark for k-truss and k-core decomposition of a graph. We also compare the performance of our algorithm with state-of-the-art Map-Reduce and parallel algorithms using openly available real world network data. Our proposed algorithms have shown substantial performance improvement.en_US
dc.identifier.doi10.7912/C2C08W
dc.identifier.urihttps://hdl.handle.net/1805/16924
dc.identifier.urihttp://dx.doi.org/10.7912/C2/2358
dc.language.isoenen_US
dc.rightsAttribution-NoDerivs 3.0 United States
dc.rights.urihttp://creativecommons.org/licenses/by-nd/3.0/us/
dc.subjectGraph Miningen_US
dc.subjectBig Dataen_US
dc.subjectApache Sparken_US
dc.subjectGraph Decompositionen_US
dc.subjectGraph Partitioningen_US
dc.subjectClusteringen_US
dc.titleDistributed graph decomposition algorithms on Apache Sparken_US
dc.typeThesisen
thesis.degree.disciplineComputer & Information Scienceen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mandal_Aritra_Thesis_Revised.pdf
Size:
1.64 MB
Format:
Adobe Portable Document Format
Description:
Main Thesis article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: