In this video, Matthew Berman pushes the limits of a powerful AI workstation equipped with two Nvidia RTX 6000 8th generation GPUs, provided by Dell and Nvidia. The workstation, a Precision 5860 Tower, features an Intel Xeon 2475 X processor, 128 GB of DDR5 RAM, and two 4 TB M.2 SSDs, making it a beast for running local inference, training models, and playing video games.

Matthew begins by showcasing the workstation’s impressive specs and then proceeds to test its capabilities by running multiple large language models (LLMs) simultaneously. He installs LM Studio and loads various models, including the Llama 3 70B instruct model, Mistal 7B instruct, and several others, quantized to different levels to fit within the 96 GB of VRAM available across the two GPUs.

He starts with smaller models, achieving speeds of around 60 tokens per second, and gradually moves to larger models, observing the performance and GPU utilization. At one point, he runs ten instances of the meta Llama 3 8B instruct model simultaneously, reaching about 80% GPU utilization and maintaining a cool temperature.

Matthew also tests the workstation’s text-to-image capabilities using stable diffusion models. He demonstrates generating images quickly and efficiently, highlighting the workstation’s ability to handle high computational loads with ease.

Throughout the video, Matthew emphasizes the workstation’s cooling efficiency, noting that it remains quiet and cool even under heavy load. He concludes by inviting viewers to suggest further tests and offering a chance to win a Dell monitor by subscribing to his newsletter.

Overall, the video provides an in-depth look at the capabilities of a high-end AI workstation, showcasing its potential for running large models and performing intensive tasks with impressive speed and efficiency.

Matthew Berman
Not Applicable
July 7, 2024
Dell Precision AI-ready workstations
PT13M18S