← RLAIF, a reinforcement learning technique Self-Rewarding Language Models →

SELF-ALIGN

SELF-ALIGN, a novel approach that combines principle-driven reasoning and the generative power of LLMs, can self-align AI agents with minimal human supervision.

SELF-ALIGN

Areas of application

SELF-ALIGN consists of four stages: generate synthetic prompts, apply human-written principles in-context, fine-tune the LLM with the high-quality self-aligned responses, and refine the responses.

Dromedary, an AI assistant developed using SELF-ALIGN, significantly outperforms several state-of-the-art AI systems with fewer than 300 lines of human annotations.

SELF-ALIGN has the potential to significantly reduce the reliance on human supervision in AI development and deployment, enabling the creation of more ethical, reliable, and helpful AI agents.

Example

Resources

[2305.03047] Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision (arxiv.org)

← RLAIF, a reinforcement learning technique Self-Rewarding Language Models →