TY - GEN
T1 - Evaluating the Efficacy of Different Neural Network Deep Reinforcement Algorithms in Complex Search-and-Retrieve Virtual Simulations
AU - Vohra, Ishita
AU - Uttrani, Shashank
AU - Rao, Akash K.
AU - Dutt, Varun
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - In recent years, Deep Reinforcement Learning (DRL) has been extensively used to solve problems in various domains like traffic control, healthcare, and simulation-based training. Proximal Policy Optimization (PPO) and Soft-Actor Critic (SAC) methods are DRL’s latest state of art on-policy and off-policy algorithms. Though previous studies have shown that SAC generally performs better than PPO, hyperparameter tuning can significantly impact the performance of these algorithms. Also, a systematic evaluation of the efficacy of these algorithms after hyperparameter tuning in dynamic and complex environments is missing and much needed in literature. This research aims to evaluate the effect of the number of layers and nodes in SAC and PPO algorithms in a search-and-retrieve task developed in the Unity 3D game engine. In the task, a bot had to navigate through the physical mesh and collect ‘target’ objects while avoiding ‘distractor’ objects. We compared the SAC and PPO models on four different test conditions that differed in the ratios of targets and distractors. Results revealed that PPO performed better than SAC for all test conditions when the number of layers and units present in the architecture was the lowest. When the number of targets was more than the distractors (9:1), PPO outperformed SAC, especially when the number of units and layers were large. Furthermore, increasing the layers and units per layer was responsible for increasing PPO and SAC performance. Results also implied that similar hyperparameter settings might be used while comparing models developed using DRL algorithms. We discuss the implications of these results and explore the possible applications of using modern, state-of-the-art DRL algorithms to learn the semantics and idiosyncrasies associated with complex and dynamic environments.
AB - In recent years, Deep Reinforcement Learning (DRL) has been extensively used to solve problems in various domains like traffic control, healthcare, and simulation-based training. Proximal Policy Optimization (PPO) and Soft-Actor Critic (SAC) methods are DRL’s latest state of art on-policy and off-policy algorithms. Though previous studies have shown that SAC generally performs better than PPO, hyperparameter tuning can significantly impact the performance of these algorithms. Also, a systematic evaluation of the efficacy of these algorithms after hyperparameter tuning in dynamic and complex environments is missing and much needed in literature. This research aims to evaluate the effect of the number of layers and nodes in SAC and PPO algorithms in a search-and-retrieve task developed in the Unity 3D game engine. In the task, a bot had to navigate through the physical mesh and collect ‘target’ objects while avoiding ‘distractor’ objects. We compared the SAC and PPO models on four different test conditions that differed in the ratios of targets and distractors. Results revealed that PPO performed better than SAC for all test conditions when the number of layers and units present in the architecture was the lowest. When the number of targets was more than the distractors (9:1), PPO outperformed SAC, especially when the number of units and layers were large. Furthermore, increasing the layers and units per layer was responsible for increasing PPO and SAC performance. Results also implied that similar hyperparameter settings might be used while comparing models developed using DRL algorithms. We discuss the implications of these results and explore the possible applications of using modern, state-of-the-art DRL algorithms to learn the semantics and idiosyncrasies associated with complex and dynamic environments.
UR - https://www.scopus.com/pages/publications/85125269239
UR - https://www.scopus.com/pages/publications/85125269239#tab=citedBy
U2 - 10.1007/978-3-030-95502-1_27
DO - 10.1007/978-3-030-95502-1_27
M3 - Conference contribution
AN - SCOPUS:85125269239
SN - 9783030955014
T3 - Communications in Computer and Information Science
SP - 348
EP - 361
BT - Advanced Computing - 11th International Conference, IACC 2021
A2 - Garg, Deepak
A2 - Jagannathan, Sarangapani
A2 - Gupta, Ankur
A2 - Garg, Lalit
A2 - Gupta, Suneet
PB - Springer Science and Business Media Deutschland GmbH
T2 - 11th International Advanced Computing Conference, IACC 2021
Y2 - 18 December 2021 through 19 December 2021
ER -