Self-Extend LLM (GitHub) is a groundbreaking technique designed to leverage the inherent abilities of Large Language Models (LLMs) to manage long contexts without the need for fine-tuning. This method introduces a bi-level attention mechanism, comprising group and neighbor levels, to extend the context window of LLMs effectively. The implementation, available in Llama.cpp, demonstrates the practical application of Self-Extend, allowing users to apply it to various models, including legacy versions. The approach is particularly beneficial for tasks requiring the processing of extensive input sequences, making it a valuable asset for researchers and practitioners in the field of natural language processing. While the current version does not specify a software release, the method is poised to enhance the efficiency and applicability of LLMs significantly.