Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements

Zheng, Weijian; Song, Fengguang; Lin, Lan

Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements

dc.contributor.author	Zheng, Weijian
dc.contributor.author	Song, Fengguang
dc.contributor.author	Lin, Lan
dc.contributor.department	Computer and Information Science, School of Science	en_US
dc.date.accessioned	2019-10-04T16:28:42Z
dc.date.available	2019-10-04T16:28:42Z
dc.date.issued	2017-11
dc.description.abstract	The k-means clustering method is one of the most widely used techniques in big data analytics. In this paper, we explore the ideas of software blocking, asynchronous local optimizations, and heuristics of simulated annealing to improve the performance of k-means clustering. Like most of the machine learning methods, the performance of k-means clustering relies on two main factors: the computing speed (per iteration), and the convergence rate. A straightforward realization of the software-blocking synchronization-reducing clustering algorithm, however, sees sporadic slower convergence rate than the standard k-means algorithm. To tackle the issues, we design an annealing-enhanced algorithm, which introduces the heuristics of stop conditions and annealing steps to provide as good or better performance than the standard k-means algorithm. This new enhanced k-means clustering algorithm is able to offer the same clustering quality as the standard k-means. Experiments with real-world datasets show that the new parallel implementation is faster than the open source HPC library of Parallel K-Means Data Clustering (e.g., 19% faster on relatively large datasets with 32 CPU cores, and 11% faster on a large dataset with 1,024 CPU cores). Moreover, the extent to which the program performance improves is largely determined by the actual convergence rate of applying the algorithm to different datasets.	en_US
dc.eprint.version	Author's manuscript	en_US
dc.identifier.citation	Zheng, W., Song, F., & Lin, L. (2017). Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements. Proceedings of the Machine Learning on HPC Environments, 9:1–9:8. https://doi.org/10.1145/3146347.3146357	en_US
dc.identifier.uri	https://hdl.handle.net/1805/21043
dc.language.iso	en	en_US
dc.publisher	ACM	en_US
dc.relation.isversionof	10.1145/3146347.3146357	en_US
dc.relation.journal	Proceedings of the Machine Learning on HPC Environments	en_US
dc.rights	Publisher Policy	en_US
dc.source	Author	en_US
dc.subject	high performance computing	en_US
dc.subject	machine learning	en_US
dc.subject	synchronization-reducing clustering algorithms	en_US
dc.title	Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zheng_2017_designing.pdf
Size:: 795.44 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.99 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Policy Articles
Department of Computer and Information Science Works