Glossary

RLHF

Posted by Fede Nolasco | Jan 21, 2024

Reinforcement Learning from Human Feedback (RLHF) is a technique that uses human feedback to train reinforcement learning (RL) agents.

LoRAMoE

Posted by Fede Nolasco | Jan 21, 2024

LoRAMoE is a plugin version of Mixture of Experts (MoE) that can effectively prevent world knowledge forgetting in large language models (LLMs) during supervised fine-tuning.

Self-Play Fine-tuning (SPIN)

Posted by Fede Nolasco | Jan 21, 2024

Self-Play Fine-tuning (SPIN) is a new fine-tuning method for Large Language Models (LLMs) that can significantly improve performance without the need for additional human-annotated data.