On using Huber loss in (Deep) Q-learning

I’ve been recently working on a problem where I put a plain DQN to use. The problem is very simple, deterministic, partially observable and states are quite low-dimensional. The agent however can’t tell some states apart and so it’s effectively stochastic in the eyes of the agent.

Because the problem was quite simple, I just expected the network to learn very good representation of the Q-function over the whole state space.

And I was surprised that this vanilla DQN totally failed in this problem. Not in the sense it was too difficult, on the contrary – the algorithm converged and was highly certain on all the Q-values it found. But these Q-values were totally wrong. I couldn’t get my head around it, but then I tracked it down to a simple cause: Huber loss.

Continue reading On using Huber loss in (Deep) Q-learning

Let’s make an A3C: Implementation

This article is part of series Let’s make an A3C.

1. Theory
2. Implementation

Introduction

In the previous article we built necessary knowledge about Policy Gradient Methods and A3C algorithm. This time we implement a simple agent with our familiar tools – Python, Keras and OpenAI Gym. However, more low level implementation is needed and that’s where TensorFlow comes to play.

The environment is the same as in DQN implementation – CartPole. Final code fits inside 300 lines and is easily converted to any other problem. A3C algorithm is very effective and learning takes only 30 seconds on a regular notebook.

Continue reading Let’s make an A3C: Implementation

Let’s make an A3C: Theory

This article is part of series Let’s make an A3C.

1. Theory
2. Implementation

Introduction

Policy Gradient Methods is an interesting family of Reinforcement Learning algorithms. They have a long history1, but only recently were backed by neural networks and had success in high-dimensional cases. A3C algorithm was published in 2016 and can do better than DQN with a fraction of time and resources2.

In this series of articles we will explain the theory behind Policy Gradient Methods, A3C algorithm and develop a simple agent in Python.

Continue reading Let’s make an A3C: Theory

Let’s make a DQN: Double Learning and Prioritized Experience Replay

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Last time we implemented a Full DQN based agent with target network and reward clipping. In this article we will explore two techniques, which will help our agent to perform better, learn faster and be more stable – Double Learning and Prioritized Experience Replay.

Continue reading Let’s make a DQN: Double Learning and Prioritized Experience Replay

Let’s make a DQN: Full DQN

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Up until now we implemented a simple Q-network based agent, which suffered from instability issues. In this article we will address these problems with two techniques – target network and error clipping. After implementing these, we will have a fully fledged DQN, as specified by the original paper1.

Continue reading Let’s make a DQN: Full DQN

Let’s make a DQN: Debugging

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Last time we saw that our Q-learning can be unstable. In this article we will cover some methods that will help us to understand what is going on inside the network.

The code for this article can be found at github.

Continue reading Let’s make a DQN: Debugging

Let’s make a DQN: Implementation

This article is part of series Let’s make a DQN.

1. Theory
2. Implementation
3. Debugging
4. Full DQN
5. Double DQN and Prioritized experience replay

Introduction

Last time we tried to get a grasp of necessary knowledge and today we will use it to build an Q-network based agent, that will solve a cart pole balancing problem, in less than 200 lines of code.

The complete code is available at github.

Continue reading Let’s make a DQN: Implementation