TY - GEN
T1 - Parallel Pancake Sorting Using MPI and CUDA
AU - Sankeerth, Garimella Sai
AU - Mukherjee, Arunava
AU - Kini, N. Gopalakrishna
AU - Upadhya, K. Jyothi
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - Pancake sorting algorithm is a simple algorithmic problem of sorting the numbers by means of flipping arrays around in a way that looks like flipping pancakes into a sorted order. Even though the idea is simple, pancake sorting raises a number of serious computational issues as the length of the sequence grows which is a problem mainly for those who need efficient organization of sorted data. This paper implements the Pancake sorting in parallel with the use of both MPI and CUDA. The slow time efficiency posed by sequential sorting in the context of high-performance scenarios are focused and resolved in this study by deploying workload across several processing nodes through MPI and also enhancing with CUDA GPU architecture separately. In performance evaluation metrics based on speedup and efficiency, varying input size was also an excellent factor in reaching the limits of scalability for the parallel approach. Using MPI, we achieved up to 162 times speedup, while using CUDA, we reached up to 1513 times speedup.
AB - Pancake sorting algorithm is a simple algorithmic problem of sorting the numbers by means of flipping arrays around in a way that looks like flipping pancakes into a sorted order. Even though the idea is simple, pancake sorting raises a number of serious computational issues as the length of the sequence grows which is a problem mainly for those who need efficient organization of sorted data. This paper implements the Pancake sorting in parallel with the use of both MPI and CUDA. The slow time efficiency posed by sequential sorting in the context of high-performance scenarios are focused and resolved in this study by deploying workload across several processing nodes through MPI and also enhancing with CUDA GPU architecture separately. In performance evaluation metrics based on speedup and efficiency, varying input size was also an excellent factor in reaching the limits of scalability for the parallel approach. Using MPI, we achieved up to 162 times speedup, while using CUDA, we reached up to 1513 times speedup.
UR - https://www.scopus.com/pages/publications/105030285199
UR - https://www.scopus.com/pages/publications/105030285199#tab=citedBy
U2 - 10.1007/978-981-96-9771-7_14
DO - 10.1007/978-981-96-9771-7_14
M3 - Conference contribution
AN - SCOPUS:105030285199
SN - 9789819697700
T3 - Lecture Notes in Networks and Systems
SP - 181
EP - 191
BT - Proceedings of the 3rd Congress on Control, Robotics, and Mechatronics - CRM 2025
A2 - Jha, Pradeep Kumar
A2 - Jamwal, Prashant
A2 - Tripathi, Brajesh
A2 - Kumar, Pankaj
A2 - Sharma, Harish
PB - Springer Science and Business Media Deutschland GmbH
T2 - 3rd Congress on Control, Robotics, and Mechatronics, CRM 2025
Y2 - 1 February 2025 through 2 February 2025
ER -