Introducing Qwen2, the latest advancement in AI technology, designed to cater to a wide range of computational needs. Qwen2 is a series of pretrained and instruction-tuned models, available in five sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. These models have been meticulously trained on data in 29 languages, ensuring state-of-the-art performance across various benchmarks.
Qwen2 models boast significant improvements in coding and mathematics, with extended context length support up to 128K tokens for certain models. The use of Group Query Attention (GQA) across all sizes enhances speed and reduces memory usage during inference3. For smaller models, tie embedding is employed to manage the large sparse embeddings efficiently.
The Qwen2 series demonstrates exceptional multilingual capabilities, having been trained on 27 additional languages besides English and Chinese. This training includes a focus on code-switching scenarios, which are common in multilingual communication, resulting in improved model performance.
Safety and responsibility are paramount in the design of Qwen2. The models have been evaluated against harmful responses in multiple languages, showing comparable safety performance to other leading models. Qwen2’s licensing has also evolved, with most models adopting the Apache 2.0 license, promoting openness and accelerating global applications.
Qwen2 is not just a technological leap forward; it’s a commitment to responsible AI development, offering enhanced performance, safety, and accessibility to the AI community. With Qwen2, users can expect a reliable, efficient, and ethically aligned AI experience.
Overall Performance: Qwen2-72B shows superior performance across most benchmarks compared to Llama3-70B, Mixtral-8x22B, and Qwen1.5-110B.
Qwen2-72B demonstrates a significant advancement in language model capabilities, particularly in coding and mathematics, while also maintaining robust multilingual abilities1.
Benchmark | Qwen2-72B | Llama3-70B | Mixtral-8x22B | Qwen1.5-110B |
---|---|---|---|---|
MMLU | 84.2 | 79.5 | 77.8 | 80.4 |
MMLU-Pro | 55.6 | 52.8 | 49.5 | 49.4 |
GPQA | 37.9 | 36.3 | 34.3 | 35.9 |
TheoremQA | 43.1 | 32.3 | 35.9 | 34.9 |
BBH | 82.4 | 81 | 78.9 | 74.8 |
HumanEval | 64.6 | 48.2 | 46.3 | 54.3 |
MBPP | 76.9 | 70.4 | 71.7 | 70.9 |
MultiPL-E | 59.6 | 46.3 | 46.7 | 52.7 |
G5M8K | 89.5 | 83 | 83.7 | 85.4 |
MATH | 51.1 | 42.5 | 41.7 | 49.6 |
C-Eval | 91 | 65.2 | 54.6 | 89.1 |
CMMLU | 90.1 | 67.2 | 53.4 | 88.3 |
Multi-Exam | 76.6 | 70 | 63.5 | 75.6 |
Multi-Understanding | 80.7 | 79.9 | 77.7 | 78.2 |
Multi-Mathematics | 76 | 67.1 | 62.9 | 64.4 |
The Qwen team is dedicated to advancing artificial general intelligence, focusing on the development of generalist models such as large language models and multimodal models. They support open-source initiatives and have released a series of models including Qwen-7B, Qwen-14B, Qwen-72B, and their chat and multimodal counterparts, Qwen-VL and Qwen-Audio. Additionally, they offer web services and an app to assist users in their daily work and life. The team comprises individuals with diverse talents and interests and encourages engagement and new members.
https://qwenlm.github.io/publication/
The Qwen community actively contributes to the development and improvement of Qwen models on GitHub. The official repository, maintained by the Qwen Team, hosts pretrained large language models proposed by Alibaba Cloud. While the Qwen/Qwen repo is no longer actively maintained due to substantial codebase differences, the community continues to engage with Qwen2 and share experiences. Qwen2, along with its chat variants, offers strong language capabilities, multilingual support, and impressive performance across various tasks. The team encourages collaboration and feedback from the community, making Qwen a dynamic and evolving project.