I’ve been working in AI and search for a decade. First building Doctrine, the largest European legal search engine and now building Fintool, an AI-powered financial research platform that helps institutional investors analyze companies, screen stocks, and make investment decisions. After three years of building, optimizing, and scaling LLMs with retrieval-augmented generation (RAG) systems, I believe we’re witnessing the twilight of RAG-based architectures. As context windows explode and agent-based architectures mature, my controversial opinion is that the current RAG infrastructure we spent so much time building and optimizing is on the decline.In late 2022, ChatGPT took the world by storm. People started endless conversations, delegating crucial work only to realize that the underlying model, GPT-3.5 could only handle 4,096 tokens... roughly six pages of text!The AI world faced a fundamental problem: how do you make an intelligent system work with knowledge bases that are orders of magnitude larger than what it can read at once?The answer became Retrieval-Augmented Generation (RAG), an architectural pattern that would dominate AI for the next three years. GPT-3.5 could handle 4,096 token and the next model GPT-4 doubled it to 8,192 tokens, about twelve pages. This wasn’t just inconvenient; it was architecturally devastating. Consider the numbers: A single SEC 10-K filing contains approximately 51,000 tokens (130+ pages).With 8,192 tokens, you could see less than 16% of a 10-K filing. It’s like reading a financial report through a keyhole!RAG emerged as an elegant solution borrowed directly from search engines. Just as Google displays 10 blue links with relevant snippets for your query, RAG retrieves the most pertinent document fragments and feeds them to the LLM for synthesis.The core idea is beautifully simple: if you can’t fit everything in context, find the most relevant pieces and use those. It turns LLMs into sophisticated search result summarizers.Basically...
First seen: 2025-10-01 23:45
Last seen: 2025-10-02 15:48