Q-learning is a model-free reinforcement learning algorithm used to learn the value of an action in a particular state. The ‘Q’ in Q-learning stands for ‘quality’, which represents how useful a given action is in gaining some future reward. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards without requiring adaptations.
For example, consider an autonomous vehicle navigating through a maze. The Q-learning algorithm can be used to learn the optimal action (e.g. left or right turn) in each state (e.g. at a fork in the road) to maximize the reward (e.g. reaching the destination).