← DPO LoRAMoE →

Self-Play Fine-tuning (SPIN)

Self-Play Fine-tuning (SPIN) is a new fine-tuning method for Large Language Models (LLMs) that can significantly improve performance without the need for additional human-annotated data.

Areas of application

SPIN starts from a supervised fine-tuned model and progressively refines its capability by playing against instances of itself.
The LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data.
SPIN outperforms models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data.
SPIN has the potential to achieve human-level performance in LLMs without the need for expert opponents.

Example

Resources

https://arxiv.org/abs/2401.01335

← DPO LoRAMoE →