On using Huber loss in (Deep) Q-learning

I’ve been recently working on a problem where I put a plain DQN to use. The problem is very simple, deterministic, partially observable and states are quite low-dimensional. The agent however can’t tell some states apart and so it’s effectively stochastic in the eyes of the agent.

Because the problem was quite simple, I just expected the network to learn very good representation of the Q-function over the whole state space.

And I was surprised that this vanilla DQN totally failed in this problem. Not in the sense it was too difficult, on the contrary – the algorithm converged and was highly certain on all the Q-values it found. But these Q-values were totally wrong. I couldn’t get my head around it, but then I tracked it down to a simple cause: Pseudo-Huber loss.

Edit: Based on the discussion, the original Huber loss with appropriate δ parameter is correct to use. The following article however stays true for L1 and pseudo-huber loss.

Continue reading On using Huber loss in (Deep) Q-learning