suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework

dc.contributor.authorZheng, Weijian
dc.contributor.authorSong, Fengguang
dc.contributor.authorLin, Lan
dc.contributor.authorChen, Zizhong
dc.contributor.departmentComputer and Information Science, School of Scienceen_US
dc.date.accessioned2017-12-28T17:12:31Z
dc.date.available2017-12-28T17:12:31Z
dc.date.issued2016-12
dc.description.abstractThe scope of this paper is to design and implement a scalable QR factorization solver that can deliver the fastest performance for tall and skinny matrices and square matrices on modern supercomputers. The new solver, named scalable universal communication-avoiding QR factorization (suCAQR), introduces a simplified and tuning-less way to realize the communication-avoiding QR factorization algorithm to support matrices of any shapes. The software design includes a mixed usage of physical and logical data layouts, a simplified method of dynamic-root binary-tree reduction, and a dynamic dataflow implementation. Compared with the existing communication avoiding QR factorization implementations, suCAQR has the benefits of being simpler, more general, and more efficient. By balancing the degree of parallelism and the proportion of faster computational kernels, it is able to achieve scalable performance on clusters of multicore nodes. The software essentially combines the strengths of both synchronization-reducing approach and communication-avoiding approach to achieve high performance. Based on the experimental results using 1,024 CPU cores, suCAQR is faster than DPLASMA by up to 30%, and faster than ScaLAPACK by up to 30 times.en_US
dc.eprint.versionAuthor's manuscripten_US
dc.identifier.citationZheng, W., Song, F., Lin, L., & Chen, Z. (2016, December). suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework. In Parallel and Distributed Systems (ICPADS), 2016 IEEE 22nd International Conference on (pp. 1092-1099). IEEE. http://dx.doi.org/10.1109/ICPADS.2016.0144en_US
dc.identifier.urihttps://hdl.handle.net/1805/14910
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.isversionof10.1109/ICPADS.2016.0144en_US
dc.relation.journal2016 IEEE 22nd International Conference on Parallel and Distributed Systemsen_US
dc.rightsIUPUI Open Access Policyen_US
dc.sourceAuthoren_US
dc.subjecthigh performance computingen_US
dc.subjectcomputational science applicationen_US
dc.subjectperformance modelingen_US
dc.titlesuCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Frameworken_US
dc.typeConference proceedingsen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zheng_2017_suCAQR.pdf
Size:
2.98 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.99 KB
Format:
Item-specific license agreed upon to submission
Description: