There has been a recent trend of teaching agents to solve puzzles and play games using Deep Reinforcement Learning (DRL) which was brought by the success of AlphaGo. While this method has given some truly groundbreaking results and it is very computationally intensive. This paper evaluates the feasibility of solving Combinatorial Optimization Problems such as Twisty Puzzles using Parallel Q-Learning (PQL). We propose a method using Constant Share-Reinforcement Learning (CSRL) as a more resource optimized approach and measure the impact of sub-goals built using human knowledge. We attempt to solve three puzzles, the 2x2x2 Pocket Rubik’s Cube, the Skewb and the Pyraminx with and without sub-goals based on popular solving methods used by humans and compare their results. Our agents are able to solve these puzzles with a 100% success rate by just a few hours of training, much better than previous DRL based agents that require large computational time. Further, the proposed approach is compared with Deep Learning based solution for 2x2x2 Rubik’s Cube and observed higher success rate.
|Number of pages||9|
|Publication status||Published - 2021|
All Science Journal Classification (ASJC) codes