Performance analysis and autotuning setup of the cuFFT library

Střelák,  David; Filipovič,  Jiří

Publication details

Performance analysis and autotuning setup of the cuFFT library

Authors	STŘELÁK David FILIPOVIČ Jiří
Year of publication	2018
Type	Article in Proceedings
Conference	ACM International Conference Proceeding Series
MU Faculty or unit	Institute of Computer Science
Citation
web	https://dl.acm.org/citation.cfm?id=3295817
Doi	http://dx.doi.org/10.1145/3295816.3295817
Keywords	cuFFT; GPU; autotuning; performance analysis; cuFFTAdvisor
Description	Fast Fourier transform (FFT) has many applications. It is often one of the most computationally demanding kernels, so a lot of attention has been invested into tuning its performance on various hardware devices. However, FFT libraries have usually many possible settings and it is not always easy to deduce which settings should be used for optimal performance. In practice, we can often slightly modify the FFT settings, for example, we can pad or crop input data. Surprisingly, a majority of state-of-the-art papers focus to answer the question how to implement FFT under given settings but do not pay much attention to the question which settings result in the fastest computation. In this paper, we target a popular implementation of FFT for GPU accelerators, the cuFFT library. We analyze the behavior and the performance of the cuFFT library with respect to input sizes and plan settings. We also present a new tool, cuFFTAdvisor, which proposes and by means of autotuning finds the best configuration of the library for given constraints of input size and plan settings. We experimentally show that our tool is able to propose different settings of the transformation, resulting in an average 6x speedup using fast heuristics and 6.9x speedup using autotuning.
Related projects:	CERIT Scientific Cloud

10 reasons why you will fall in love with MU

Ask our ambassador

Read about research at MU

10 reasons why you will fall in love with MU

Ask our ambassador

Read about research at MU

Performance analysis and autotuning setup of the cuFFT library