IU Indianapolis ScholarWorks :: Browsing by Subject "Apache Spark"

Browsing by Subject "Apache Spark"

Now showing 1 - 2 of 2

Distributed graph decomposition algorithms on Apache Spark
(2018-04-20) Mandal, Aritra; Hasan, Mohammad Al; Mohler, George; Song, Fengguang
Structural analysis and mining of large and complex graphs for describing the characteristics of a vertex or an edge in the graph have widespread use in graph clustering, classiﬁcation, and modeling. There are various methods for structural analysis of graphs including the discovery of frequent subgraphs or network motifs, counting triangles or graphlets, spectral analysis of networks using eigenvectors of graph Laplacian, and ﬁnding highly connected subgraphs such as cliques and quasi cliques. Unfortunately, the algorithms for solving most of the above tasks are quite costly, which makes them not-scalable to large real-life networks. Two such very popular decompositions, k-core and k-truss of a graph give very useful insight about the graph vertex and edges respectively. These decompositions have been applied to solve protein functions reasoning on protein-protein networks, fraud detection and missing link prediction problems. k-core decomposition with is linear time complexity is scalable to large real-life networks as long as the input graph ﬁts in the main memory. k-truss on the other hands is computationally more intensive due to its deﬁnition relying on triangles and their is no linear time algorithm available for it. In this paper, we propose distributed algorithms on Apache Spark for k-truss and k-core decomposition of a graph. We also compare the performance of our algorithm with state-of-the-art Map-Reduce and parallel algorithms using openly available real world network data. Our proposed algorithms have shown substantial performance improvement.
Enumerating k-cliques in a large network using Apache Spark
(2017) Dheekonda, Raja Sekhar Rao; Al Hasan, MOHAMMAD
Network analysis is an important research task which explains the relationships among various entities in a given domain. Most of the existing approaches of network analysis compute global properties of a network, such as transitivity, diameter, and all-pair shortest paths. They also study various non-random properties of a network, such as graph densifi cation with shrinking diameter, small diameter, and scale-freeness. Such approaches enable us to understand real-life networks with global properties. However, the discovery of the local topological building blocks within a network is an important task, and examples include clique enumeration, graphlet counting, and motif counting. In this paper, my focus is to fi nd an efficient solution of k-clique enumeration problem. A clique is a small, connected, and complete induced subgraph over a large network. However, enumerating cliques using sequential technologies is very time-consuming. Another promising direction that is being adopted is a solution that runs on distributed clusters of machines using the Hadoop mapreduce framework. However, the solution suffers from a general limitation of the framework, as Hadoop's mapreduce performs substantial amounts of reading and writing to disk. Thus, the running times of Hadoop-based approaches suffer enormously. To avoid these problems, we propose an e cient, scalable, and distributed solution, kc-spark , for enumerating cliques in real-life networks using the Apache Spark in-memory cluster computing framework. Experiment results show that kc-spark can enumerate k-cliques from very large real-life networks, whereas a single commodity machine cannot produce the same desired result in a feasible amount of time. We also compared kc-spark with Hadoop mapreduce solutions and found the algorithm to be 80-100 percent faster in terms of running times. On the other hand, we compared with the triangle enumeration with Hadoop mapreduce and results shown that kc-spark is 8-10 times faster than mapreduce implementation with the same cluster setup. Furthermore, the overall performance of kc-spark is improved by using Spark's inbuilt caching and broadcast transformations.

Browsing by Subject "Apache Spark"

Results Per Page

Sort Options