Context Rot: How increasing input tokens impacts LLM performance

https://news.ycombinator.com/rss Hits: 5

Summary

Recent developments in LLMs show a trend toward longer context windows, with the input token count of the latest models reaching the millions. Because these models achieve near-perfect scores on widely adopted benchmarks like Needle in a Haystack (NIAH) [1], it’s often assumed that their performance is uniform across long-context tasks.However, NIAH is fundamentally a simple retrieval task, in which a known sentence (the “needle”) is placed in a long document of unrelated text (the “haystack”), and the model is prompted to retrieve it. While scalable, this benchmark typically assesses direct lexical matching, which may not be representative of flexible, semantically oriented tasks.Example Needle in a Haystack (NIAH) SetupWe extend the standard NIAH task, to investigate model behavior in previously underexplored settings. We examine the effects of needles with semantic, rather than direct lexical matches, as well as the effects of introducing variations to the haystack content.Additionally, we include a conversational question-answer evaluation using LongMemEval [2], as well as a synthetic task in which models replicate a series of repeated words. Each task remains intentionally simple and is deliberately controlled to isolate the impact of context length alone.We demonstrate that even under these minimal conditions, model performance degrades as input length increases, often in surprising and non-uniform ways. Real-world applications typically involve much greater complexity, implying that the influence of input length may be even more pronounced in practice.Our in-depth technical report continues below. If you find our work useful, please consider citing us:Interested in working on improving retrieval for AI applications? Chroma is HiringIntroduction#It is common for modern LLMs to have input context lengths in the millions of tokens. Gemini 1.5 Pro [3] first introduced their 1M context window in early 2024, followed by the recent GPT-4.1’s 1M context window [4] an...

First seen: 2025-07-14 22:01

Last seen: 2025-07-15 02:01

Read Full Article More from this Source

Context Rot: How increasing input tokens impacts LLM performance

Summary

Related News

Doge Denizen Marko Elez Leaked API Key for XAI

NeuralOS: An Operating System Powered by Neural Networks

Strategies for Fast Lexers

Embedding User-Defined Indexes in Apache Parquet

I Solved the Century-Old Mystery of a Miraculous Shipwreck Survivor