TY - JOUR
T1 - FedDSL
T2 - A Novel Client Selection Method to Handle Statistical Heterogeneity in Cross-Silo Federated Learning using Flower Framework
AU - Pais, Vineetha
AU - Rao, Santhosha
AU - Muniyal, Balachandra
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024
Y1 - 2024
N2 - Federated learning provides a mechanism for different silos to collaborate, and each silo gets aid without compromising privacy. This simulation study is based on healthcare datasets, so the silos are hospitals or healthcare organizations. The selection of hospitals for federated learning increases the overall performance of the model. Cross-silo comes with many challenges, even though the number of participating clients is limited compared to cross-device federated learning. This study specifically addresses two of those aspects, heterogeneity of data and local performance. An approach called FedDSL based on 'Datasize', 'Skewness', and 'Local Performance' is introduced in this paper. Initially, synthetic data are generated considering the size of the data and skewness, which creates statistical heterogeneity in the cross-silo environment. Once this environment is created, a client selection strategy is applied that uses a weighted approach to select clients. A statistical analysis checks the data distributed among hospitals using skewness and normality tests. Experiments are conducted using the Flower Framework, and FedDSL is compared with random client selection. The model is applied with various aggregation algorithms, including FedAvg, FedProx, and FedAdam. The results show an increased model performance with the FedDSL approach compared to random client selection.
AB - Federated learning provides a mechanism for different silos to collaborate, and each silo gets aid without compromising privacy. This simulation study is based on healthcare datasets, so the silos are hospitals or healthcare organizations. The selection of hospitals for federated learning increases the overall performance of the model. Cross-silo comes with many challenges, even though the number of participating clients is limited compared to cross-device federated learning. This study specifically addresses two of those aspects, heterogeneity of data and local performance. An approach called FedDSL based on 'Datasize', 'Skewness', and 'Local Performance' is introduced in this paper. Initially, synthetic data are generated considering the size of the data and skewness, which creates statistical heterogeneity in the cross-silo environment. Once this environment is created, a client selection strategy is applied that uses a weighted approach to select clients. A statistical analysis checks the data distributed among hospitals using skewness and normality tests. Experiments are conducted using the Flower Framework, and FedDSL is compared with random client selection. The model is applied with various aggregation algorithms, including FedAvg, FedProx, and FedAdam. The results show an increased model performance with the FedDSL approach compared to random client selection.
UR - http://www.scopus.com/inward/record.url?scp=85207780409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85207780409&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3482388
DO - 10.1109/ACCESS.2024.3482388
M3 - Article
AN - SCOPUS:85207780409
SN - 2169-3536
VL - 12
SP - 159648
EP - 159659
JO - IEEE Access
JF - IEEE Access
ER -