TY - GEN
T1 - Optimization of the Bitap Algorithm for High-Performance Pattern Matching in DNA Sequences
AU - Apoorva Ananth, P.
AU - Rachana Shanbhogue, V.
AU - Gopalakrishna Kini, N.
AU - Jyothi Upadhya, K.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Deoxyribonucleic acid (DNA) is the molecule that holds genetic information needed for the growth, development and functioning of an organism. The four types of nitrogenous bases found in nucleotides that make up each strand of DNA are: adenine (A), cytosine (C), guanine (G), or thymine (T). Efficient analysis of DNA sequencing is required since mutations in a DNA sequence can lead to genetic disorders or diseases. Bitap is an approximate string-matching algorithm that provides an effective solution for identifying such mutations in large genomic datasets. However, with an increase in data size, the performance of the sequential Bitap algorithm also declines. This paper presents a parallelized approach to improve the efficiency of the Bitap algorithm by using the Message Passing Interface (MPI) for distributed memory systems and Compute Unified Device Architecture (CUDA) for parallel computation on Graphics Processing Units (GPUs). This study focuses on comparison of execution times of three implementations of the Bitap algorithm for mutation detection: the sequential implementation, the MPI-based parallelization, and the CUDA-based GPU implementation. Experimental results indicate that the parallel approaches achieve significant performance improvements as compared to the sequential implementation, thereby enabling faster processing and better scalability for large DNA datasets.
AB - Deoxyribonucleic acid (DNA) is the molecule that holds genetic information needed for the growth, development and functioning of an organism. The four types of nitrogenous bases found in nucleotides that make up each strand of DNA are: adenine (A), cytosine (C), guanine (G), or thymine (T). Efficient analysis of DNA sequencing is required since mutations in a DNA sequence can lead to genetic disorders or diseases. Bitap is an approximate string-matching algorithm that provides an effective solution for identifying such mutations in large genomic datasets. However, with an increase in data size, the performance of the sequential Bitap algorithm also declines. This paper presents a parallelized approach to improve the efficiency of the Bitap algorithm by using the Message Passing Interface (MPI) for distributed memory systems and Compute Unified Device Architecture (CUDA) for parallel computation on Graphics Processing Units (GPUs). This study focuses on comparison of execution times of three implementations of the Bitap algorithm for mutation detection: the sequential implementation, the MPI-based parallelization, and the CUDA-based GPU implementation. Experimental results indicate that the parallel approaches achieve significant performance improvements as compared to the sequential implementation, thereby enabling faster processing and better scalability for large DNA datasets.
UR - https://www.scopus.com/pages/publications/105023294943
UR - https://www.scopus.com/pages/publications/105023294943#tab=citedBy
U2 - 10.1007/978-981-96-9203-3_42
DO - 10.1007/978-981-96-9203-3_42
M3 - Conference contribution
AN - SCOPUS:105023294943
SN - 9789819692026
T3 - Lecture Notes in Electrical Engineering
SP - 515
EP - 526
BT - Recent Trends in Artificial Intelligence and Data Sciences - Select Proceedings of the 15th International Conference, CONFLUENCE 2025
A2 - Kumar, Sumit
A2 - Aggarwal, Garima
A2 - Unhelkar, Bhuvan
A2 - Pal, Raju
PB - Springer Science and Business Media Deutschland GmbH
T2 - 15th International Conference on Recent Trends in Artificial Intelligence and Data Sciences, CONFLUENCE 2025
Y2 - 16 January 2025 through 17 January 2025
ER -