TY - GEN
T1 - Elastic MapReduce for Scalable Image Processing in the Cloud
AU - Raj, Rakesh S.
AU - Pavan Kumar, M. P.
AU - Manjunath, K. N.
AU - Rangaswamy, B. E.
AU - Shamna, N. V.
AU - Pradeep, N. B.
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In digital imaging for medical diagnostics, especially chest X-rays, raster images like JPEG, PNG, and TIFF are frequently utilized. For effective preprocessing, annotation, and machine learning training, large-scale image collections must be arranged according to format and resolution. This study suggests a scalable method for sorting and storing raster-type medical images using Elastic MapReduce (EMR) from Amazon Web Services (AWS). The pipeline uses AWS S3 storage and the Hadoop MapReduce architecture to distribute the identification of image characteristics and arrange them into structured S3 pathways. The outcomes show fault tolerance, cost-effectiveness, and high throughput for datasets with more than hundreds of thousands of images. Elastic MapReduce has gained popularity as a framework for handling massive amounts of data because of its fault-tolerant, scalable, and economical infrastructure. This study examines optimization strategies, assesses performance under various workloads, and looks into the integration of image processing pipelines into EMR clusters. The findings demonstrate that EMR can significantly increase throughput and scalability for large-scale image processing activities as classification, feature extraction, and filtering.
AB - In digital imaging for medical diagnostics, especially chest X-rays, raster images like JPEG, PNG, and TIFF are frequently utilized. For effective preprocessing, annotation, and machine learning training, large-scale image collections must be arranged according to format and resolution. This study suggests a scalable method for sorting and storing raster-type medical images using Elastic MapReduce (EMR) from Amazon Web Services (AWS). The pipeline uses AWS S3 storage and the Hadoop MapReduce architecture to distribute the identification of image characteristics and arrange them into structured S3 pathways. The outcomes show fault tolerance, cost-effectiveness, and high throughput for datasets with more than hundreds of thousands of images. Elastic MapReduce has gained popularity as a framework for handling massive amounts of data because of its fault-tolerant, scalable, and economical infrastructure. This study examines optimization strategies, assesses performance under various workloads, and looks into the integration of image processing pipelines into EMR clusters. The findings demonstrate that EMR can significantly increase throughput and scalability for large-scale image processing activities as classification, feature extraction, and filtering.
UR - https://www.scopus.com/pages/publications/105030077764
UR - https://www.scopus.com/pages/publications/105030077764#tab=citedBy
U2 - 10.1109/DISCOVER66922.2025.11258916
DO - 10.1109/DISCOVER66922.2025.11258916
M3 - Conference contribution
AN - SCOPUS:105030077764
T3 - 2025 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics, DISCOVER 2025 - Proceedings
SP - 43
EP - 48
BT - 2025 IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics, DISCOVER 2025 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE International Conference on Distributed Computing, VLSI, Electrical Circuits and Robotics, DISCOVER 2025
Y2 - 17 October 2025 through 18 October 2025
ER -