Glossary

RLHF

Reinforcement Learning from Human Feedback (RLHF) is a technique that uses human feedback to train reinforcement learning (RL) agents.

Read More

LoRAMoE

LoRAMoE is a plugin version of Mixture of Experts (MoE) that can effectively prevent world knowledge forgetting in large language models (LLMs) during supervised fine-tuning.

Read More

Self-Play Fine-tuning (SPIN)

Self-Play Fine-tuning (SPIN) is a new fine-tuning method for Large Language Models (LLMs) that can significantly improve performance without the need for additional human-annotated data.

Read More