TY - GEN
T1 - Parallel Implementation of Dutch Flag Sorting Algorithm Using MPI and CUDA
AU - Aishwarya, S.
AU - Bhat, Ananya
AU - Gopalakrishna Kini, N.
AU - Jyothi Upadhya, K.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - This paper focuses on improving the performance of the Dutch flag sorting algorithm for large datasets with many duplicates using parallel computing techniques. Traditional quicksort struggles with many comparisons and swaps, especially with repeated elements. By partitioning the data into three sections—values less than, equal to, and greater than a pivot—the three-way quicksort eliminates unnecessary steps, enhancing efficiency. The study explores parallel versions of Dutch flag sort using Compute Unified Device Architecture (CUDA) for GPU acceleration and Message Passing Interface (MPI) for distributed systems, comparing their speed and scalability. The parallel implementations leverage the computational power of modern architectures to handle large-scale datasets with exceptional speed and efficiency. The findings include comparative performance metrics, such as sorting 1,500,000 elements in 74.4 ms sequentially, 3.089 ms with MPI, and 0.530 ms using CUDA. The proposed parallel implementations are expected to significantly improve sorting speed, particularly for datasets with frequent duplicates, making them ideal for applications in database systems and large-scale data analysis. The findings suggest that parallel three-way quicksort could be a reliable option for high-performance computing environments, improving sorting efficiency across various domains.
AB - This paper focuses on improving the performance of the Dutch flag sorting algorithm for large datasets with many duplicates using parallel computing techniques. Traditional quicksort struggles with many comparisons and swaps, especially with repeated elements. By partitioning the data into three sections—values less than, equal to, and greater than a pivot—the three-way quicksort eliminates unnecessary steps, enhancing efficiency. The study explores parallel versions of Dutch flag sort using Compute Unified Device Architecture (CUDA) for GPU acceleration and Message Passing Interface (MPI) for distributed systems, comparing their speed and scalability. The parallel implementations leverage the computational power of modern architectures to handle large-scale datasets with exceptional speed and efficiency. The findings include comparative performance metrics, such as sorting 1,500,000 elements in 74.4 ms sequentially, 3.089 ms with MPI, and 0.530 ms using CUDA. The proposed parallel implementations are expected to significantly improve sorting speed, particularly for datasets with frequent duplicates, making them ideal for applications in database systems and large-scale data analysis. The findings suggest that parallel three-way quicksort could be a reliable option for high-performance computing environments, improving sorting efficiency across various domains.
UR - https://www.scopus.com/pages/publications/105023301970
UR - https://www.scopus.com/pages/publications/105023301970#tab=citedBy
U2 - 10.1007/978-981-96-9203-3_39
DO - 10.1007/978-981-96-9203-3_39
M3 - Conference contribution
AN - SCOPUS:105023301970
SN - 9789819692026
T3 - Lecture Notes in Electrical Engineering
SP - 479
EP - 489
BT - Recent Trends in Artificial Intelligence and Data Sciences - Select Proceedings of the 15th International Conference, CONFLUENCE 2025
A2 - Kumar, Sumit
A2 - Aggarwal, Garima
A2 - Unhelkar, Bhuvan
A2 - Pal, Raju
PB - Springer Science and Business Media Deutschland GmbH
T2 - 15th International Conference on Recent Trends in Artificial Intelligence and Data Sciences, CONFLUENCE 2025
Y2 - 16 January 2025 through 17 January 2025
ER -