Does Context Length Matters?
The stateof.ai 2023 report highlights (page 23,24) the importance of context length in language models (LMs) and its impact on their performance. Traditionally, parameter count has been used as a proxy for a model’s capabilities, but recent research indicates that the size of the input text can sometimes constrain these capabilities. This makes context length an important factor in the effectiveness of language models.

One of the main benefits of large language models (LLMs) is their few-shot capabilities, which allow them to respond to queries without additional training. However, this is often limited by the context length, leading to a computational and memory bottleneck. To address this issue, various innovations have been developed to increase the context length of LLMs, such as FlashAttention, which reduces the memory footprint of attention, and ALiBi, which allows models to be trained on small contexts but run inference on larger ones.
The report lists several long-context LLMs, such as Anthropic’s Claude with a 100K token length, OpenAI’s GPT-4 with 32K, MosaicML MPT-7B with over 65K, and LMSys’s LongChat with 16K. However, it raises the question of whether context length is the only factor that matters.
Further research from Samaya.ai, UC Berkeley, Stanford, and LMSYS.org challenges the notion that longer context lengths always lead to better performance. The study found that performance was better when relevant information was at the beginning or end of the input, but decreased as the input length increased. This was particularly true for multi-document question answering and key-value retrieval tasks. Additionally, proprietary models were found to struggle less with longer context lengths than open models.