Phi 3 mini 4k

by Fede Nolasco | May 6, 2024

The Phi-3-Mini-4K-Instruct is a compact, advanced AI model with 3.8B parameters, designed for English language commercial and research applications. It excels in reasoning, language understanding, and long-context tasks. The model is integrated with the latest transformers and supports up to 32064 tokens, facilitating diverse AI-powered features. Phi-3-Mini-4K-Instruct is available in HuggingChat and as an ONNX model for cross-platform compatibility.

The Phi-3-Mini-4K-Instruct is a cutting-edge language model designed for both commercial and research applications. It is a part of the Phi-3 series, and this particular variant boasts 3.8 billion parameters. The model is trained on the expansive Phi-3 datasets, which include synthetic data and high-quality, filtered web data. Its training emphasizes reasoning density and quality, making it adept at handling complex language tasks.
This model has undergone rigorous post-training processes, including supervised fine-tuning and direct preference optimization, to ensure it follows instructions accurately and maintains safety standards. When benchmarked, the Phi-3 Mini-4K-Instruct demonstrates superior performance in areas such as common sense, language understanding, math, code, and logical reasoning, especially when compared to other models with fewer than 13 billion parameters.
Developers should note that while the model is versatile, it is not tailored for all possible use cases. It is crucial to evaluate the model for accuracy, safety, and fairness within the specific context of its application, particularly in high-risk scenarios. Additionally, developers must comply with relevant laws and regulations, including those related to privacy and trade compliance.
The Phi-3 Mini-4K-Instruct is integrated into the development version of the transformers library and is also available on HuggingChat. It supports a vocabulary size of up to 32,064 tokens and is optimized for chat-format prompts. The model is licensed under the MIT license, and while it includes various trademarks, their use must align with Microsoft’s Trademark & Brand Guidelines.
For those looking to implement the model, it is compatible with multi-GPU setups and can be run on specific GPU hardware types. It also supports ONNX runtime across various platforms and hardware, ensuring broad accessibility and optimization for different devices.



3.82B, LLM



Current



MIT License



Instruction-tuned



Phi

Comparison

Sourced on: May 5, 2024

The Phi-3-Mini-4K-Instruct model, despite having only 3.8 billion parameters, demonstrates remarkable performance across various benchmarks, often outperforming larger models. Here are the key highlights:

1) MMLU (5-Shot): Phi-3-Mini-4K-Instruct scores 68.8 compared to GPT-3.5’s 71.4 despite the latter having more than triple the parameters.
2) HellaSwag (5-Shot): It achieves 76.7, which is competitive with larger models like GPT-3.5’s 78.8.
3) GSM-8K (0-Shot; CoT): The model excels with 82.5, outshining Mistral’s 46.4 and closely following GPT-3.5’s 78.1.
4) TriviaQA (5-Shot): Phi-3-Mini-4K-Instruct’s 64.0 is noteworthy, especially when compared to larger models like Llama-3-In’s 75.2 and Mixtral’s 82.2.

Overall, Phi-3-Mini-4K-Instruct’s performance is impressive, especially considering its smaller size relative to other models. It showcases the efficiency of its design and training, making it a robust choice for various applications.

Benchmark	Phi-3-Mini-4K-In 3.8b	Phi-3-Small 7b (preview)	Phi-3-Medium 14b (preview)	Phi-2 2.7b	Mistral 7b	Gemma 7b	Llama-3-In 8b	Mixtral 8x7b	GPT-3.5 version 1106
MMLU 5-Shot	68.8	75.3	78.2	56.3	61.7	63.6	66.5	68.4	71.4
HellaSwag 5-Shot	76.7	78.7	83.2	53.6	58.5	49.8	71.1	70.4	78.8
ANLI 7-Shot	52.8	55	58.7	42.5	47.1	48.7	57.3	55.2	58.1
GSM-8K 0-Shot; CoT	82.5	86.4	90.8	61.1	46.4	59.8	77.4	64.7	78.1
MedQA 2-Shot	53.8	58.2	69.8	40.9	49.6	50	60.5	62.2	63.4
AGIEval 0-Shot	37.5	45	49.7	29.8	35.1	42.1	42	45.2	48.4
TriviaQA 5-Shot	64	59.1	73.3	45.2	72.3	75.2	67.7	82.2	85.8
Arc-C 10-Shot	84.9	90.7	91.9	75.9	78.6	78.3	82.8	87.3	87.4
Arc-E 10-Shot	94.6	97.1	98	88.5	90.6	91.4	93.4	95.6	96.3
PIQA 5-Shot	84.2	87.8	88.2	60.2	77.7	78.1	75.7	86	86.6
SociQA 5-Shot	76.6	79	79.4	68.3	74.6	65.5	73.9	75.9	68.3
BigBench-Hard 0-Shot	71.7	75	82.5	59.4	57.3	59.6	51.5	69.7	68.32
WinoGrande 5-Shot	70.8	82.5	81.2	54.7	54.2	55.6	65	62	68.8
OpenBookQA 10-Shot	83.2	88.4	86.6	73.6	79.8	78.6	82.6	85.8	86
BoolQ 0-Shot	77.6	82.9	86.5		72.2	66	80.9	77.6	79.1
CommonSenseQA 10-Shot	80.2	80.3	82.6	69.3	72.6	76.2	79	78.1	79.6
TruthfulQA 10-Shot	65	68.1	74.8		52.1	53	63.2	60.1	85.8
HumanEval 0-Shot	59.1	59.1	54.7	47	28	34.1	60.4	37.8	62.2
MBPP 3-Shot	53.8	71.4	73.7	60.6	50.8	51.5	67.7	60.2	77.8

Team

The team behind the Large Language Model (LLM) mentioned on the current page is from Microsoft, a verified entity with a strong presence in AI and ML research. This team, comprising 1405 members, has contributed to various projects, including the development of state-of-the-art models and frameworks. One of their notable contributions is the SpeechT5 framework, which addresses multiple audio-related tasks through a unified seq2seq model complemented by modal-specific pre/post-nets1. Another significant project is TAPEX, a pre-training model designed for table-based question answering and fact verification, showcasing their expertise in handling structured data. The team’s work reflects a commitment to advancing the field of machine learning, particularly in natural language processing and speech synthesis, as evidenced by their extensive research and model updates. Their collaborative efforts have resulted in a collection of models and datasets that serve as valuable resources for the broader AI community.

Microsoft

Resources

List of resources related to this product.

← Smaug-72B-v0.1 Phi 3 mini 128k →