Let’s make a DQN: Double Learning and Prioritized Experience Replay

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Last time we implemented a Full DQN based agent with target network and reward clipping. In this article we will explore two techniques, which will help our agent to perform better, learn faster and be more stable – Double Learning and Prioritized Experience Replay.

Continue reading

Let’s make a DQN: Full DQN

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Up until now we implemented a simple Q-network based agent, which suffered from instability issues. In this article we will address these problems with two techniques – target network and error clipping. After implementing these, we will have a fully fledged DQN, as specified by the original paper1.

Continue reading

Let’s make a DQN: Implementation

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Last time we tried to get a grasp of necessary knowledge and today we will use it to build an Q-network based agent, that will solve a cart pole balancing problem, in less than 200 lines of code.

The complete code is available at github.

Continue reading

Let’s make a DQN: Theory

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

In February 2015, a group of researches from Google DeepMind published a paper1 which marks a milestone in machine learning. They presented a novel, so called DQN network, which could achieve breathtaking results by playing a set of Atari games, receiving only a visual input.

In these series of articles, we will progressively develop our knowledge to build a state-of-the-art agent, that will be able to learn to solve variety of tasks just by observing the environment. We will explain the needed theoretical background in less technical terms and then build a program to demonstrate how the theory works in practice.

Continue reading