The article introduces LLaVA-NeXT, an advancement over LLaVA-1.5, featuring enhanced reasoning, OCR, and world knowledge capabilities. It boasts better performance on benchmarks, improved visual reasoning and OCR with a new data mixture, and efficient deployment with SGLang. The largest 34B variant of LLaVA-NeXT is trained with less than 1M visual instruction tuning samples and maintains the minimalist design of its predecessor. The open-source release includes code, data, and model, with some components to be made available soon.