An Empirical Study of On-Policy and Off-Policy Actor-Critic Algorithms in the Context of Exploration-Exploitation Dilemma

Supriya Seshagiri, K. V. Prema

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The Exploration-Exploitation dilemma in Reinforcement Learning (RL) algorithms is about deciding whether to select a sub-optimal path to the outcome and acquire a more varied learning of the environment or to select the greedy path and seek to maximize rewards. It is a fundamental challenge in RL algorithms which influences their learning efficiency. The on-policy and off-policy design of RL algorithms has an influence on their ability to explore non-greedy actions thereby affecting their learning ability. The paper presents the results of experiments conducted to analyze the effect of on-policy and off-policy design and the use of entropy in exploration in actor-critic algorithms and investigate the root causes of effective learning in algorithms. An empirical comparison of Soft Actor Critic (SAC) which is off-policy and Proximal Policy Optimization (PPO), an on-policy algorithm, are performed through these experiments in several continuous OpenAI Gym environments and the effect of exploration strategies like entropy, off-policy target update, and Generalized Advantage Estimate (GAE) factor on the bias-variance balance of these algorithms are analyzed.

Original languageEnglish
Title of host publicationProceedings of the 2023 International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages238-243
Number of pages6
ISBN (Electronic)9798350300604
DOIs
Publication statusPublished - 2023
Event3rd International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2023 - Hyderabad, India
Duration: 21-09-202323-09-2023

Publication series

NameProceedings of the 2023 International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2023

Conference

Conference3rd International Conference on Emerging Techniques in Computational Intelligence, ICETCI 2023
Country/TerritoryIndia
CityHyderabad
Period21-09-2323-09-23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Computer Vision and Pattern Recognition
  • Information Systems
  • Instrumentation

Fingerprint

Dive into the research topics of 'An Empirical Study of On-Policy and Off-Policy Actor-Critic Algorithms in the Context of Exploration-Exploitation Dilemma'. Together they form a unique fingerprint.

Cite this