TY - GEN
T1 - Parallelization of Counting Sort
AU - Dsouza, Aston
AU - Akshay, C.
AU - Kini, N. Gopalakrishna
AU - Rao, B. Ashwath
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Counting Sort is a novel sorting algorithm that runs in linear time and is well-known for its ease of use and efficiency when sorting numbers within a certain range. The suggested system investigates the parallelization of Counting Sort utilizing both Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) to take advantage of distributed architecture and GPU computation. The proposed parallelization strategy aims to enhance the sorting performance by distributing the workload across multiple processors using MPI and exploiting the parallel processing capabilities of CUDA that are enabled by Graphical Processing Units (GPUs). The MPI framework is employed for inter-process communication and load balancing among distributed nodes, while CUDA is utilized to accelerate the sorting process by harnessing the parallel processing capabilities of modern GPUs. The paper presents the design and implementation details of the parallel Counting Sort algorithm, highlighting the specific challenges introduced for MPI and CUDA integration. Experimental results demonstrate the efficiency and scalability of the proposed parallelization approach, showcasing significant reductions in sorting time for large datasets compared to traditional serial implementations.
AB - Counting Sort is a novel sorting algorithm that runs in linear time and is well-known for its ease of use and efficiency when sorting numbers within a certain range. The suggested system investigates the parallelization of Counting Sort utilizing both Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) to take advantage of distributed architecture and GPU computation. The proposed parallelization strategy aims to enhance the sorting performance by distributing the workload across multiple processors using MPI and exploiting the parallel processing capabilities of CUDA that are enabled by Graphical Processing Units (GPUs). The MPI framework is employed for inter-process communication and load balancing among distributed nodes, while CUDA is utilized to accelerate the sorting process by harnessing the parallel processing capabilities of modern GPUs. The paper presents the design and implementation details of the parallel Counting Sort algorithm, highlighting the specific challenges introduced for MPI and CUDA integration. Experimental results demonstrate the efficiency and scalability of the proposed parallelization approach, showcasing significant reductions in sorting time for large datasets compared to traditional serial implementations.
UR - https://www.scopus.com/pages/publications/105028318904
UR - https://www.scopus.com/pages/publications/105028318904#tab=citedBy
U2 - 10.1007/978-981-96-8799-2_10
DO - 10.1007/978-981-96-8799-2_10
M3 - Conference contribution
AN - SCOPUS:105028318904
SN - 9789819687985
T3 - Lecture Notes in Networks and Systems
SP - 121
EP - 127
BT - Machine Intelligence for Research and Innovations - Proceedings of MAiTRI 2024
A2 - Verma, Om Prakash
A2 - Wang, Lipo
A2 - Kumar, Rajesh
A2 - Yadav, Anupam
A2 - Rout, Ranjeet Kumar
PB - Springer Science and Business Media Deutschland GmbH
T2 - 2nd International Conference on Machine Intelligence for Research and Innovations, MAiTRI 2024 Summit
Y2 - 21 June 2024 through 23 June 2024
ER -