- Browse by Author
Browsing by Author "Dave, Vachik S."
Now showing 1 - 8 of 8
Results Per Page
Sort Options
Item A Combined Representation Learning Approach for Better Job and Skill Recommendation(ACM, 2018-10) Dave, Vachik S.; Al Hasan, Mohammad; Zhang, Baichuan; AlJadda, Khalifeh; Korayem, Mohammed; Computer and Information Science, School of ScienceJob recommendation is an important task for the modern recruitment industry. An excellent job recommender system not only enables to recommend a higher paying job which is maximally aligned with the skill-set of the current job, but also suggests to acquire few additional skills which are required to assume the new position. In this work, we created three types of information net- works from the historical job data: (i) job transition network, (ii) job-skill network, and (iii) skill co-occurrence network. We provide a representation learning model which can utilize the information from all three networks to jointly learn the representation of the jobs and skills in the shared k-dimensional latent space. In our experiments, we show that by jointly learning the representation for the jobs and skills, our model provides better recommendation for both jobs and skills. Additionally, we also show some case studies which validate our claims.Item E-CLoG: Counting edge-centric local graphlets(IEEE, 2017-12) Dave, Vachik S.; Ahmed, Nesreen K.; Al Hasan, Mohammad; Computer and Information Science, School of ScienceIn recent years, graphlet counting has emerged as an important task in topological graph analysis. However, the existing works on graphlet counting obtain the graphlet counts for the entire network as a whole. These works capture the key graphical patterns that prevail in a given network but they fail to meet the demand of the majority of real-life graph related prediction tasks such as link prediction, edge/node classification, etc., which require to build features for an edge (or a vertex) of a network. To meet the demand for such applications, efficient algorithms are needed for counting local graphlets within the context of an edge (or a vertex). In this work, we propose an efficient method, titled E-CLOG, for counting all 3,4 and 5 size local graphlets with the context of a given edge for its all different edge orbits. We also provide a shared-memory, multi-core implementation of E-CLOG, which makes it even more scalable for very large real-world networks. In particular, We obtain strong scaling on a variety of graphs (14x-20x on 36 cores). We provide extensive experimental results to demonstrate the efficiency and effectiveness of the proposed method. For instance, we show that E-CLOG is faster than existing work by multiple order of magnitudes; for the Wordnet graph E-CLOG counts all 3,4 and 5-size local graphlets in 1.5 hours using a single thread and in only a few minutes using the parallel implementation, whereas the baseline method does not finish in more than 4 days. We also show that local graphlet counts around an edge are much better features for link prediction than well-known topological features; our experiments show that the former enjoys between 10% to 45% of improvement in the AUC value for predicting future links in three real-life social and collaboration networks.Item Feature Selection for Classification under Anonymity Constraint(2017) Zhang, Baichuan; Mohammed, Noman; Dave, Vachik S.; Al Hasan, Mohammad; Computer and Information Science, School of ScienceOver the last decade, proliferation of various online platforms and their increasing adoption by billions of users have heightened the privacy risk of a user enormously. In fact, security researchers have shown that sparse microdata containing information about online activities of a user although anonymous, can still be used to disclose the identity of the user by cross-referencing the data with other data sources. To preserve the privacy of a user, in existing works several methods (k-anonymity, l-diversity, differential privacy) are proposed that ensure a dataset which is meant to share or publish bears small identity disclosure risk. However, the majority of these methods modify the data in isolation, without considering their utility in subsequent knowledge discovery tasks, which makes these datasets less informative. In this work, we consider labeled data that are generally used for classification, and propose two methods for feature selection considering two goals: first, on the reduced feature set the data has small disclosure risk, and second, the utility of the data is preserved for performing a classification task. Experimental results on various real-world datasets show that the method is effective and useful in practice.Item How Fast Will You Get a Response? Predicting Interval Time for Reciprocal Link Creation(2017) Dave, Vachik S.; Al Hasan, Mohammad; Reddy, Chandan K.; Computer and Information Science, School of ScienceIn the recent years, reciprocal link prediction has received some attention from the data mining and social network analysis researchers, who solved this problem as a binary classification task. However, it is also important to predict the interval time for the creation of reciprocal link. This is a challenging problem for two reasons: First, the lack of effective features, because well-known link prediction features are designed for undirected networks and for the binary classification task, hence they do not work well for the interval time prediction; Second, the presence of censored data instances makes the traditional supervised regression methods unsuitable for solving this problem. In this paper, we propose a solution for the reciprocal link interval time prediction task. We map this problem into survival analysis framework and show through extensive experiments on real-world datasets that, survival analysis methods perform better than traditional regression, neural network based model and support vector regression (SVR).Item Neural‑Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding(Springer, 2019-06) Dave, Vachik S.; Zhang, Balchuan; Chen, Pin-Yu; Al Hasan, Mohammad; Computer and Information Science, School of ScienceNetwork embedding methodologies, which learn a distributed vector representation for each vertex in a network, have attracted considerable interest in recent years. Existing works have demonstrated that vertex representation learned through an embedding method provides superior performance in many real-world applications, such as node classification, link prediction, and community detection. However, most of the existing methods for network embedding only utilize topological information of a vertex, ignoring a rich set of nodal attributes (such as user profiles of an online social network, or textual contents of a citation network), which is abundant in all real-life networks. A joint network embedding that takes into account both attributional and relational information entails a complete network information and could further enrich the learned vector representations. In this work, we present Neural-Brane, a novel Neural Bayesian Personalized Ranking based Attributed Network Embedding. For a given network, Neural-Brane extracts latent feature representation of its vertices using a designed neural network model that unifies network topological information and nodal attributes. Besides, it utilizes Bayesian personalized ranking objective, which exploits the proximity ordering between a similar node pair and a dissimilar node pair. We evaluate the quality of vertex embedding produced by Neural-Brane by solving the node classification and clustering tasks on four real-world datasets. Experimental results demonstrate the superiority of our proposed method over the state-of-the-art existing methods.Item Predicting interval time for reciprocal link creation using survival analysis(Springer, 2018-12) Dave, Vachik S.; Al Hasan, Mohammad; Zhang, Baichuan; Reddy, Chandan K.; Computer and Information Science, School of ScienceThe majority of directed social networks, such as Twitter, Flickr and Google+, exhibit reciprocal altruism, a social psychology phenomenon, which drives a vertex to create a reciprocal link with another vertex which has created a directed link toward the former. In existing works, scientists have already predicted the possibility of the creation of reciprocal link—a task known as “reciprocal link prediction”. However, an equally important problem is determining the interval time between the creation of the first link (also called parasocial link) and its corresponding reciprocal link. No existing works have considered solving this problem, which is the focus of this paper. Predicting the reciprocal link interval time is a challenging problem for two reasons: First, there is a lack of effective features, since well-known link prediction features are designed for undirected networks and for the binary classification task; hence, they do not work well for the interval time prediction; Second, the presence of ever-waiting links (i.e., parasocial links for which a reciprocal link is not formed within the observation period) makes the traditional supervised regression methods unsuitable for such data. In this paper, we propose a solution for the reciprocal link interval time prediction task. We map this problem to a survival analysis task and show through extensive experiments on real-world datasets that survival analysis methods perform better than traditional regression, neural network-based models and support vector regression for solving reciprocal interval time prediction.Item TopCom: Index for Shortest Distance Query in Directed Graph(Springer, 2015) Dave, Vachik S.; Al Hasan, Mohammad; Department of Computer & Information Science, School of ScienceFinding shortest distance between two vertices in a graph is an important problem due to its numerous applications in diverse domains, including geo-spatial databases, social network analysis, and information retrieval. Classical algorithms (such as, Dijkstra) solve this problem in polynomial time, but these algorithms cannot provide real-time response for a large number of bursty queries on a large graph. So, indexing based solutions that pre-process the graph for efficiently answering (exactly or approximately) a large number of distance queries in real-time are becoming increasingly popular. Existing solutions have varying performance in terms of index size, index building time, query time, and accuracy. In this work, we propose TopCom, a novel indexing-based solution for exactly answering distance queries in a directed acyclic graph (DAG). Our experiments with two of the existing state-of-the-art methods (IS-Label and TreeMap) show the superiority of TopCom over these two methods considering scalability and query time.Item Triangle counting in large networks: a review(Wiley, 2018-03) Al Hasan, Mohammad; Dave, Vachik S.; Computer and Information Science, School of ScienceCounting and enumeration of local topological structures, such as triangles, is an important task for analyzing large real‐life networks. For instance, triangle count in a network is used to compute transitivity—an important property for understanding graph evolution over time. Triangles are also used for various other tasks completed for real‐life networks, including community discovery, link prediction, and spam filtering. The task of triangle counting, though simple, has gained wide attention in recent years from the data mining community. This is due to the fact that most of the existing algorithms for counting triangles do not scale well to very large networks with millions (or even billions) of vertices. To circumvent this limitation, researchers proposed triangle counting methods that approximate the count or run on distributed clusters. In this paper, we discuss the existing methods of triangle counting, ranging from sequential to parallel, single‐machine to distributed, exact to approximate, and off‐line to streaming. We also present experimental results of performance comparison among a set of approximate triangle counting methods built under a unified implementation framework. Finally, we conclude with a discussion of future works in this direction.