In this video, Matthew Berman tests the newly released Qwen 2 models by Alibaba Group, which reportedly outperform LLaMA 3 across various benchmarks. The video focuses on testing both the largest (72 billion parameters) and the smallest (0.5 billion parameters) versions of Qwen 2 to evaluate their performance and speed.
Matthew begins by explaining the different sizes of Qwen 2 models, which range from 0.5 billion to 72 billion parameters, and highlights their extended context length support of up to 128k tokens for the larger models. He then compares the performance of Qwen 2 against LLaMA 3 and other models, showing that Qwen 2 generally performs better in various tasks, including code and math.
The testing is conducted using a high-performance PC provided by Dell and Nvidia, featuring two A6000 GPUs with 48 GB of VRAM each. Matthew uses LM Studio to run the local tests and Hugging Face spaces to test the larger models.
The first test involves writing a Python script to output numbers from 1 to 100, which both the smallest and largest Qwen 2 models pass. However, the smaller model struggles with more complex tasks like writing the Snake game in Python, while the largest model performs exceptionally well.
Matthew also tests the models on uncensored queries, logic and reasoning problems, math questions, and more. The smaller model often fails to provide correct or meaningful answers, while the largest model consistently performs well, showcasing its superior capabilities.
The video concludes with Matthew expressing satisfaction with the Qwen 2 models, particularly the larger ones, and highlighting their potential for various applications. He also mentions that Qwen 2 does not yet have vision capabilities, limiting its use in certain tasks.
Overall, the video provides a comprehensive evaluation of Qwen 2 models, demonstrating their strengths and weaknesses compared to other large language models.