In this video, Prompt Engineering explores Alibaba’s latest Qwen-2 model, the best open weight model available. The Qwen-2 model family ranges from 0.5 billion to 72 billion parameters and supports up to 128,000 tokens, making it more useful than Lama 3, which supports up to 8,000 tokens. The model also supports multiple languages, with a special focus on Middle Eastern and Southeastern languages. The video focuses on function calling and agentic workflows using Qwen-2’s powerful features.
The video starts by explaining the difference between function calling and agents. Function calling allows the LLM to interact with external tools or functions, such as APIs, to get real-time information. However, the LLM cannot execute these functions; it generates inputs, and the user must execute the function and return the results to the LLM. Agents, on the other hand, are more complex and can plan, perform actions, and keep track of operations using short-term and long-term memory.
To demonstrate these concepts, the video shows how to set up and run Qwen-2 locally using AMA. The setup involves creating a virtual environment, installing the Qwen-Agent package, and running the model on a virtual machine with sufficient VRAM. The video provides a practical example of function calling, where the LLM determines which function to use based on user input, generates the inputs, and executes the function to get the results.
The video also shows how to create custom agents using Qwen-Agent. The example involves generating images based on user input and downloading the images to a specific folder. The agent uses a text-to-image API and a code interpreter to execute the necessary commands. The video highlights the agent’s ability to plan, execute tasks, and modify plans if needed.
In the final section, the video discusses the impact of quantization on model performance. It shows that quantization has a significant impact on smaller models, with a noticeable drop in performance for 4-bit quantization. The video advises against using 4-bit quantization for production and recommends at least 8-bit or 16-bit quantization.
Overall, the video provides a comprehensive overview of Qwen-2’s capabilities in function calling and agentic workflows, along with practical examples and insights into quantization effects.