Reinforcement learning is supervised learning on optimized data
“The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods. While these methods have shown considerable success in recent years, these methods are still quite challenging to apply to new problems. In contrast deep supervised learning has been extremely successful and we may hence ask: Can we use supervised learning to perform RL?”
Source: https://bair.berkeley.edu/blog/2020/10/13/supervised-rl/
October 20, 2020
Subscribe
Login
Please login to comment
0 Comments