← Cody AI assistant for your entire code base WASP & AIDER: Revolutionizing AI Agent Software Development →

Devin: AI agent for software development

by Fede Nolasco | Mar 25, 2024

The SWE-bench technical report presents a comparative analysis of various language models’ performance in resolving real-world GitHub issues. The outcomes highlight that Devin, an AI agent for software development developed by Cognition, achieved a 13.86% success rate, which is a notable improvement over other models. For instance, Claude 2 and GPT-4 resolved 4.8% and 1.7% of instances, respectively, even with the aid of an oracle retriever. These results underscore Devin’s advanced capabilities in executing multi-step plans and iterating based on feedback, essential traits for practical and intelligent software development.

Devin’s performance on SWE-bench is impressive, the benchmark’s design and the nature of the tasks might not provide a level playing field for all AI models being evaluated.

 Cognition Labs, Scott Wu

 Not Applicable

 March 25, 2024

 Cognition Labs Home Page

← Cody AI assistant for your entire code base WASP & AIDER: Revolutionizing AI Agent Software Development →