TY - GEN
T1 - SWIFT-a performance accelerated optimized string matching algorithm for NVIDIA GPUs
AU - Shenoy, Sourabh S.
AU - Nayak, U. Supriya
AU - Neelima, Bayyapu
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/4/18
Y1 - 2017/4/18
N2 - This paper presents a study of exact string matching algorithms and their performance behavior when executed on dynamic parallelism enabled Kepler Graphics Processing Unit (GPU) by Nvidia. The algorithms considered in this paper are Quick search (QS), Horspool (HP), and Brute force (BF) string matching. Their efficient implementation on Kepler gives a remarkable improvement over their respective multi-core CPU and generic GPU implementations. In addition, the proposed work further optimizes the algorithm by exploiting the fundamental architectural aspects of the GPU like the memory hierarchy, lightweight threads and avoiding strided access. The optimization associated with the memory is called Binning and the one using memory alignment is named Chunking. As a result of the significant boost obtained by extracting the most out of these features, the newer methods are named as SWIFT. SWIFT algorithms give performance benefit of about 1.7X on an average and up to 5X in some cases, which will be discussed in the paper. Also, the paper proposes a hybrid algorithm that employs both regular GPU implementation and SWIFT based on a predefined condition that gives benefit for all pattern sizes. The experimental results for different pattern sizes using benchmark datasets are presented in the paper.
AB - This paper presents a study of exact string matching algorithms and their performance behavior when executed on dynamic parallelism enabled Kepler Graphics Processing Unit (GPU) by Nvidia. The algorithms considered in this paper are Quick search (QS), Horspool (HP), and Brute force (BF) string matching. Their efficient implementation on Kepler gives a remarkable improvement over their respective multi-core CPU and generic GPU implementations. In addition, the proposed work further optimizes the algorithm by exploiting the fundamental architectural aspects of the GPU like the memory hierarchy, lightweight threads and avoiding strided access. The optimization associated with the memory is called Binning and the one using memory alignment is named Chunking. As a result of the significant boost obtained by extracting the most out of these features, the newer methods are named as SWIFT. SWIFT algorithms give performance benefit of about 1.7X on an average and up to 5X in some cases, which will be discussed in the paper. Also, the paper proposes a hybrid algorithm that employs both regular GPU implementation and SWIFT based on a predefined condition that gives benefit for all pattern sizes. The experimental results for different pattern sizes using benchmark datasets are presented in the paper.
UR - https://www.scopus.com/pages/publications/85017643607
UR - https://www.scopus.com/pages/publications/85017643607#tab=citedBy
U2 - 10.1109/ISPDC.2016.19
DO - 10.1109/ISPDC.2016.19
M3 - Conference contribution
AN - SCOPUS:85017643607
T3 - Proceedings - 15th International Symposium on Parallel and Distributed Computing, ISPDC 2016
SP - 80
EP - 87
BT - Proceedings - 15th International Symposium on Parallel and Distributed Computing, ISPDC 2016
A2 - Chen, Riqing
A2 - Grigoras, Dan
A2 - Rong, Chunming
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th International Symposium on Parallel and Distributed Computing, ISPDC 2016
Y2 - 8 July 2016 through 10 July 2016
ER -