← Qwen2: Advanced Multilingual and Computational Abilities LLama 3.1 AI Model →

Claude 3.5 Sonnet AI Model: A New Benchmark in Intelligence

by Fede Nolasco | Jun 25, 2024

Claude 3.5 Sonnet excels in intelligence and performance, mastering graduate-level reasoning, undergraduate knowledge, and coding tasks. Operating at double the speed of its predecessor, it's ideal for dynamic environments, introducing "Artifacts" for enhanced real-time interaction. Rigorously tested for safety, it offers a cost-effective AI solution for business integration.

Introducing Claude 3.5 Sonnet: The latest model in the Claude 3.5 series, Claude 3.5 Sonnet, is now available. It’s a top-performing AI model that excels in intelligence, speed, and cost-effectiveness.
Accessibility and Pricing: You can access Claude 3.5 Sonnet for free on Claude.ai and its iOS app. Subscribers to Claude Pro and Team plans get higher rate limits. The model costs $3 per million input tokens and $15 per million output tokens.
Performance Benchmarks: Claude 3.5 Sonnet sets new standards in graduate-level reasoning, undergraduate knowledge, and coding proficiency. It’s twice as fast as its predecessor and offers a 200K token context window.
Coding and Vision Capabilities: The model showcases advanced coding abilities, solving 64% of agentic coding problems. It also surpasses previous models in vision benchmarks, particularly in tasks requiring visual reasoning.



LLM



Current



Commercial License



Pretrained, Instruction-tuned



coding proficiency, intelligence, performance, privacy, safety, visual reasoning

Comparison

Sourced on: June 25, 2024

Claude 3.5 Sonnet shows competitive performance across a variety of benchmarks designed to test different aspects of reasoning, knowledge, and problem-solving abilities. In “Graduate level reasoning” on the GPQA, Diamond benchmark with 0-shot chain-of-thought (COT), it achieves a score of 59.4%, surpassing Claude 3 and closely following GPT-4o. In “Undergraduate level knowledge” tested by MMLU, Claude 3.5 Sonnet consistently shows strong performance, scoring 88.7% in 5-shot and 88.3% in 0-shot COT, indicating robustness in both multi-shot and zero-shot settings.

When it comes to programming problems as assessed by “Code HumanEval”, Claude 3.5 reaches a high score of 92.0% in 0-shot settings, clearly leading other models like Claude 3 and GPT-4o. In “Multilingual math” via the MGSM benchmark, it scores 91.6% in a 0-shot COT context, maintaining high competence in multilingual mathematical problem solving. For the “Reasoning over text” DROP benchmark, Claude 3.5 achieves an F1 score of 87.1 in 3-shot settings, showing solid reasoning abilities over text.

The model also excels in “Mixed evaluations” on the BIG-Bench-Hard with 93.1% in 3-shot COT, outperforming other versions like Gemini 1.5 pro and GPT-4o. In mathematical problem solving (MATH benchmark), it scores 71.1% in 0-shot COT, showing reasonable mathematical reasoning skills. Lastly, in “Grade school math” (GSM8K), it achieves an impressive 96.4% in 0-shot COT, highlighting its strong foundational math skills. Overall, Claude 3.5 Sonnet demonstrates a well-rounded and robust performance across diverse cognitive domains.

Benchmark	Claude 3.5 Sonnet	Claude 3 Opus	GPT 4o	Gemini 1.5 pro	Llama-400b (early snapshot)
Graduate level reasoning GPQA, Diamond	59.4%* 0-shot COT	50.4% 0-shot COT	53.6% 0-shot COT	-	-
Undergraduate level knowledge MMLU	88.7% 5-shot	86.8% 5-shot	_	85.9% 5-shot	86.1% 5-shot
Code HumanEval	92.0% 0-shot	84.9% 0-shot	90.2% 0-shot	84.1% 0-shot	84.1% 0-shot
Multilingual math MGSM	91.6% 0-shot COT	90.7% 0-shot COT	90.5% 0-shot COT	87.5% 8-shot	-
Reasoning over text DROP, Fl score	87.1 3-shot	83.1 3-shot	83.4 3-shot	74.9 Variable shots	83.5 3-shot Pre-trained model
Mixed evaluations BIG-Bench-Hard	93.1% 3-shot COT	86.8% 3-shot COT	-	89.2% 3-shot COT	85.3% 3-shot COT Pre-trained model
Math problem-solving MATH	71.1% 0-shot COT	60.1% 0-shot COT	76.6% 0-shot COT	67.7% 4-shot	57.8% 4-shot COT
Grade school math GSM8K	96.4% 0-shot COT	95.0% 0-shot COT	-	90.8% Il-shot	94.1% 8-shot COT

Team

Team Anthropic is a collective of skilled individuals dedicated to advancing AI technology through their product, Claude. They focus on creating an AI assistant that enhances team productivity by tapping into shared expertise. Claude is designed to facilitate easy collaboration, serving as a virtual teammate that not only accelerates routine tasks like email and document writing but also aids in generating ideas, pulling insights from data, and producing high-quality work with less effort. By integrating Projects, Claude can access specific knowledge, enabling each team member to contribute expert-level results, thereby making work more productive across various domains such as engineering, support, marketing, and sales1.

https://www.anthropic.com/team

Resources

List of resources related to this product.

← Qwen2: Advanced Multilingual and Computational Abilities LLama 3.1 AI Model →