Python bindings for llama.cpp provide a simple interface to integrate the llama.cpp library into Python projects. This package includes documentation, requirements, and installation instructions. It supports various hardware acceleration backends, allowing for faster inference. The high-level API offers a managed interface through the Llama class, enabling basic text completion and chat completions with pre-registered chat formats or custom chat handlers. The package also supports OpenAI compatible function and tool calling, multi-modal models for text and image processing, and speculative decoding for faster completions. Additionally, it offers a web server as a drop-in replacement for the OpenAI API, complete with Docker support for easy deployment. The low-level API provides direct bindings to the C API for advanced users. The package is actively developed, open to contributions, and licensed under the MIT license.