In the previous article we built necessary knowledge about Policy Gradient Methods and A3C algorithm. This time we implement a simple agent with our familiar tools – Python, Keras and OpenAI Gym. However, more low level implementation is needed and that’s where TensorFlow comes to play.
The environment is the same as in DQN implementation – CartPole. Final code fits inside 300 lines and is easily converted to any other problem. A3C algorithm is very effective and learning takes only 30 seconds on a regular notebook.
Continue reading Let’s make an A3C: Implementation
Policy Gradient Methods is an interesting family of Reinforcement Learning algorithms. They have a long history, but only recently were backed by neural networks and had success in high-dimensional cases. A3C algorithm was published in 2016 and can do better than DQN with a fraction of time and resources.
In this series of articles we will explain the theory behind Policy Gradient Methods, A3C algorithm and develop a simple agent in Python.
Continue reading Let’s make an A3C: Theory