SFT

Supervised fine-tuning (SFT) is a frequently used method for aligning large language models (LLMs) to human preferences. It involves curating a dataset of high-quality LLM outputs and then fine-tuning the model on this data using a next token prediction objective.

Supervised fine-tuning (SFT)

Areas of application

  • Supervised fine-tuning is a simple and cost-effective method for aligning LLMs to human preferences.
  • Supervised fine-tuning is based on the idea of learning from examples of good LLM output.
  • Supervised fine-tuning can be used to improve a variety of LLM behaviors, such as instruction following, helpfulness, and safety.
  • Supervised fine-tuning is a common component of the three-step framework for LLM training, which also includes pretraining and reinforcement learning from human feedback (RLHF).

  • Simple and cost-effective: SFT does not require the use of expensive human annotations, which makes it a more practical option than RLHF.
  • Versatility: Supervised fine-tuning can be used to improve a wide range of LLM behaviors.
  • Effectiveness: SFT has been shown to be effective in improving the performance of LLMs on a variety of downstream tasks.

Example

  • To improve the accuracy of LLM-generated summaries: SFT can be used to train an LLM to generate summaries that are more concise, relevant, and informative.
  • To make LLMs more helpful and informative in response to user queries: SFT can be used to train an LLM to provide more comprehensive and accurate answers to user questions.
  • To discourage harmful or unsafe outputs: SFT can be used to train an LLM to generate outputs that are not offensive, biased, or misleading.

Supervised fine-tuning (SFT) is a powerful tool for aligning LLMs to human preferences and making them more useful and reliable.