Self-Play Fine-tuning (SPIN) is a new fine-tuning method for Large Language Models (LLMs) that can significantly improve performance without the need for additional human-annotated data.
Areas of application
SPIN starts from a supervised fine-tuned model and progressively refines its capability by playing against instances of itself.
The LLM generates its own training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data.
SPIN outperforms models trained through direct preference optimization (DPO) supplemented with extra GPT-4 preference data.
SPIN has the potential to achieve human-level performance in LLMs without the need for expert opponents.