In addition to the agent and the environment, there are four sub-elements: policy, reward, value function, and model.


A policy determines how an agent will behave in a specific environment. It is a fundamental component of the agent that can independently decide what action to take in a given environment.

Reward Signal

We have briefly discussed the reward earlier. It defines the goal of a particular problem. After completing each step, it sends a number to the agent known as the reward. This signal helps determine which events are good and which are bad.

Value Function

The value function calculates what the reward will be after completing a full step. While the reward determines the immediate result of a step, the value function determines which step will be beneficial for the future. There can be multiple value functions for the same task, but there will be only one optimum value function for a specific task.


A model is used for planning, that is, making decisions based on previously obtained results. Reinforcement learning can work without a model; it is not essential. Therefore, for an agent, there can be two types of learning:

  • Model-Based Learning
  • Model-Free Learning

In model-based learning, the agent primarily works through exploitation, while in model-free learning, it works through exploration.

0 0 votes
Article Rating
Would love your thoughts, please comment.x