Phi 3 mini 128k

by Fede Nolasco | May 6, 2024

The Phi-3-Mini-128K-Instruct is a cutting-edge language model designed for English commercial and research applications. It excels in common sense, language understanding, mathematics, coding, and logical reasoning tasks. With a focus on safety and instruction-following, it’s integrated into the transformers development version and supports a wide range of GPUs and ONNX platforms. The model encourages responsible AI practices and compliance with relevant laws and regulations.

The Phi-3-Mini-128K-Instruct is a cutting-edge language model designed for both commercial and research applications. It’s part of the Phi-3 family and stands out with its 3.8 billion parameters, making it a lightweight yet powerful tool for various AI-powered features. The model’s training involved a combination of synthetic data and high-quality website data, ensuring a focus on reasoning and comprehension.
Post-training, the model was fine-tuned and optimized for instruction-following and safety adherence. Its performance is state-of-the-art, especially in areas like common sense, language understanding, and logical reasoning, where it competes well with models up to 13 billion parameters.
Developers should note that while the model excels in English language tasks, it’s not tailored for all scenarios. Accuracy, safety, and fairness evaluations are crucial, particularly in high-risk situations. Compliance with relevant laws and regulations is also essential.
The model integrates seamlessly with the transformers library and supports a vocabulary size of up to 32064 tokens. It’s optimized for chat-format prompts and can be run on specific GPU hardware or via ONNX across various platforms.
In summary, the Phi-3-Mini-128K-Instruct is a versatile and robust model that pushes the boundaries of AI research and application, provided it’s used responsibly and within legal frameworks.



3.82B, LLM



Current



MIT License



Instruction-tuned



Phi

Comparison

Sourced on: May 5, 2024

The Phi-3 Mini-128K-Instruct, part of the Phi-3 family, is a 3.8 billion-parameter model that has demonstrated impressive performance across various benchmarks. Here’s a comparison with other language models:

1) MMLU (5-Shot): Phi-3 Mini-128K-Instruct scored 68.1 compared to GPT-3.5’s 71.4 despite having fewer parameters.
2) HellaSwag (5-Shot): Achieved 74.5, which is competitive with larger models like GPT-3.5’s 78.8.
3) ANLI (7-Shot): Scored 52.8, showcasing its robust reasoning abilities compared to GPT-3.5’s 58.1.
4) GSM-8K (0-Shot; CoT): Excelled with 83.6, indicating strong mathematical reasoning, higher than GPT-3.5’s 78.1.
5) MedQA (2-Shot): Scored 55.3, reflecting its medical knowledge, which is close to GPT-3.5’s 63.4.
6) AGIEval (0-Shot): With 36.9, it shows potential in general AI evaluation, nearing GPT-3.5’s 48.4.

Overall, the Phi-3 Mini-128K-Instruct stands out for its efficiency and capability, achieving comparable or superior results to larger models like GPT-3.5 in several benchmarks, highlighting its state-of-the-art performance among models with fewer than 13 billion parameters. This makes it a valuable asset for both commercial and research applications, especially considering its ability to understand and generate human-like text.

Benchmark	Phi-3-Mini-128K-In 3.8b	Phi-3-Small 7b (preview)	Phi-3-Medium 14b (preview)	Phi-2 2.7b	Mistral 7b	Gemma 7b	Llama-3-In 8b	Mixtral 8x7b	GPT-3.5 version 1106
MMLU 5-Shot	68.1	75.3	78.2	56.3	61.7	63.6	66.5	68.4	71.4
HellaSwag 5-Shot	74.5	78.7	83.2	53.6	58.5	49.8	71.1	70.4	78.8
ANLI 7-Shot	52.8	55	58.7	42.5	47.1	48.7	57.3	55.2	58.1
GSM-8K 0-Shot; CoT	83.6	86.4	90.8	61.1	46.4	59.8	77.4	64.7	78.1
MedQA 2-Shot	55.3	58.2	69.8	40.9	49.6	50	60.5	62.2	63.4
AGIEval 0-Shot	36.9	45	49.7	29.8	35.1	42.1	42	45.2	48.4
TriviaQA 5-Shot	57.1	59.1	73.3	45.2	72.3	75.2	67.7	82.2	85.8
Arc-C 10-Shot	84	90.7	91.9	75.9	78.6	78.3	82.8	87.3	87.4
Arc-E 10-Shot	95.2	97.1	98	88.5	90.6	91.4	93.4	95.6	96.3
PIQA 5-Shot	83.6	87.8	88.2	60.2	77.7	78.1	75.7	86	86.6
SociQA 5-Shot	76.1	79	79.4	68.3	74.6	65.5	73.9	75.9	68.3
BigBench-Hard 0-Shot	71.5	75	82.5	59.4	57.3	59.6	51.5	69.7	68.32
WinoGrande 5-Shot	72.5	82.5	81.2	54.7	54.2	55.6	65	62	68.8
OpenBookQA 10-Shot	80.6	88.4	86.6	73.6	79.8	78.6	82.6	85.8	86
BoolQ 0-Shot	78.7	82.9	86.5		72.2	66	80.9	77.6	79.1
CommonSenseQA 10-Shot	78	80.3	82.6	69.3	72.6	76.2	79	78.1	79.6
TruthfulQA 10-Shot	63.2	68.1	74.8		52.1	53	63.2	60.1	85.8
HumanEval 0-Shot	57.9	59.1	54.7	47	28	34.1	60.4	37.8	62.2
MBPP 3-Shot	62.5	71.4	73.7	60.6	50.8	51.5	67.7	60.2	77.8

Team

The team behind the Large Language Model (LLM) mentioned on the current page is from Microsoft, a verified entity with a strong presence in AI and ML research. This team, comprising 1405 members, has contributed to various projects, including the development of state-of-the-art models and frameworks. One of their notable contributions is the SpeechT5 framework, which addresses multiple audio-related tasks through a unified seq2seq model complemented by modal-specific pre/post-nets1. Another significant project is TAPEX, a pre-training model designed for table-based question answering and fact verification, showcasing their expertise in handling structured data. The team’s work reflects a commitment to advancing the field of machine learning, particularly in natural language processing and speech synthesis, as evidenced by their extensive research and model updates. Their collaborative efforts have resulted in a collection of models and datasets that serve as valuable resources for the broader AI community.

Microsoft

Resources

List of resources related to this product.

← Phi 3 mini 4k Llama 3 8K →