← Self-Rewarding Language Models SFT →

ReFT: Reasoning with Reinforced Fine-Tuning

ReFT is an acronym for Reasoning with Reinforced Fine-Tuning. It is a method that combines supervised fine-tuning and reinforcement learning to improve the performance of large language models (LLMs) in reasoning tasks, especially in math problem-solving. ReFT first warm-ups the model with supervised fine-tuning using a dataset of question and chain-of-thought pairs, and then employs on-line reinforcement learning using a dataset of question and answer pairs, where the rewards are derived from the ground-truth answers.

ReFT

Areas of application

ReFT can be used to test and enhance the mathematical reasoning ability of LLMs, such as BERT, GPT-3, and RoBERTa. It can also be used to evaluate how well these models can understand natural language, generate mathematical expressions, and solve mathematical challenges. ReFT is also intended to encourage the development of more robust and generalizable models that can handle diverse and complex problems

Example

Here is an example problem from the GSM8K dataset, which is one of the datasets used for ReFT:

Find the area of a circle with radius 5 cm. – A) A = pi * r^2 – B) A = 2 * pi * r – C) A = pi * r – D) A = pi * r^3

The correct answer is A, as the formula for the area of a circle is A = pi * r^2, where r is the radius.

Resources

← Self-Rewarding Language Models SFT →