Reinforcement learning

Reinforcement Learning in Machine Learning

Reinforcement learning (RL) is a type of Machine Learning associated with how intelligent agents should take actions in an environment to maximize rewards. It is employed by finding the best possible path or behavior it should make in a specific situation.

The agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. On repeat, the agent tries to minimize the wrong ones and maximize the right ones.

Reinforcement learning differs from supervised learning in a way that supervised learning contains training data so the model is trained with the correct answer itself while in reinforcement learning, there is no such data but the reinforcement agent decides how to perform the given task. In Reinforcement learning, it will also learn from its experience.

The goal in unsupervised learning is to find similarities and differences between data points, in reinforcement learning the goal is to find a suitable action model that would maximize the reward of the agent.

Q-learning and SARSA (State-Action-Reward-State-Action) are two commonly used RL algorithms. RL is quite widely used in building AI for playing computer games.  AlphaGo Zero is the first computer program to defeat a world champion in the ancient Chinese game of Go. Others include ATARI games, Backgammon, etc.,

Some of the important terms in Reinforcement Learning:


  • State: Current situation of the agent
  • Environment: Physical world in which the agent operates
  • Reward: Feedback from the environment
  • Value: Future reward that an agent would receive by taking an action in a particular state



  •  Used in robotics for industrial automation.
  • Other applications of RL include text summarization engines, dialog agents (text, speech) which can learn from user interactions and improve with time, learning optimal treatment policies in healthcare, and RL-based agents for online stock trading.

Yann LeCun, the renowned French scientist and head of research at Facebook, jokes that

Reinforcement learning is the cherry on a great AI cake with

Machine learning the cake itself and deep learning the icing.

Without the previous iterations, the cherry would top nothing.

Leave a Comment