RAG.pro

What's going to kill your RAG business?

"All I want to know is where I'm going to die so I'll never go there" - Charlie Munger

Since many forms of RAG depend on models plateauing a bit from here, I'm going to explore the very real world of the near future where these models continue on their same trajectory to see what problems are likely to stay and what are likely to get innovated away.

This article is a thought experiment to think ahead and adjust your business plans given where things are likely headed.

How RAG on Open Web Data will die

Perplexity leads the 'RAG on the open web' market by far, but what it fails to accomplish is what I like to call "Vertical AI" meaning a very specialized AI app compared to the "Horizontal AI" apps which are mostly just base models such as Chatgpt that can answer a broad range of questions, but fail to accuratley go deep with a subject. Vertical AI apps began as just ChatGPT wrappers with a verticalized system prompt but have turned into RAG apps, fine-tuned models, GPTs, and a combination of other techniques to create a specialized application. A good use-case for a more verticalized AI is a coding agent. You could narrow this down to a specific coding language that is focused towards specific app types (e.g. chrome extensions) and using certain packages. Of course you would not use perplexity to help you code this app, but you'd likely use an app that uses an LLM in an agentic way to build specifically chrome extensions. But what if the kind of chrome extensions you want to build rely on coding frameworks such as Langchain (which is updated weekly)? If the coding app uses an OpenAI LLM, it would have little to no knowledge of what Langchain even is since its training date was so long ago. To fix this issue, the coding app would need to set up a RAG pipeline to a vector database containing Langchain's up-to-date documentation data as well as a web scrapper running periodically to update the data. This would require weeks and maybe even months of effort to set up from scratch. Nevertheless, this has become a very common use-case for RAG since most web search APIs tend to not retrieve pages buried deep within specific sites (e.g documentation sites).

Setting up a RAG pipeline to software documentation so that you create LLMs that codes with the most up to date libraries can be seen as a very valuable business right now. But how would this business die?

"When operating on the cutting edge of technology, people often get cut."

There are 2 main possible disruptions to this business:
  1. Increasing the Quality of Data Retrieved

  2. LLMs that have access and understanding of the entire Web

It's very important to distinguish the difference between having access to a data source and having an understanding of the data source. Anyone that's tried to build a RAG app knows that you could have a vector database with relevant information and query that database with a question for which you are confident the answer to the question is present in the data, yet the returned documents are not the ones that contain the answer. If this is your issue, check out this post

For this post, we'll assume that you have a RAG app connected to a data source which comes from the open web and you have a way to retrieve the relevant data needed for an LLM to generate an accurate answer given the user's query.

Now for how this business can get disrupted:

Increasing the Quality of Data Retrieved

This can set apart one RAG app from another by using different techniques. GraphRAG recently shook up the RAG community by showing a data retrieval method which can answer queries such as "what is this data about" as well as very specific questions which has answers deep and spread across the data. In theory, this can easily outperform a Naive RAG approach of searching on a vector database (even with hybrid search) because the graph can communicate the relationships between relevant nodes to the LLM rather than just returning the most relevant chunk(s) of data. But GraphRAG has weaknesses of its own such as being very difficult to scale to production because of the tedious data preprocessing required to make a quality graph. Not to mention that you'd still have to implement the vector database functionality on top of the graph and querying the graph is inconsistent with LLMs not being well trained on the traditional query language "Cypher". Critics of GraphRAG also claim that Graphs are simply metadata filtering and only made for people to look at. While the "Graphers" defends graphs ruthlessly claiming that "Graphs are everywhere" but the rest of the world just doesn't know it yet. I'm not sure which side is right and I don't think either side truly believes their method is that much superior to the other.

The truth of the matter is that no one knows the best way of doing RAG, but all they can do is hack away at making it as good as possible given their use-case.

A common theme for both parties is the fear of Base LLMs improving to a point where they have access and understanding of the entire up-to-date web.

LLMs that have access and understanding of the entire Web

Casuals think that ChatGPT has access to the entire web. It does and it doesn't. it has access to a SERP API which gets the top webpage given a search query. This is not nearly enough context for the LLM to beat out the understanding that Vertical AI apps have with RAG on a specific data sources from the open web.

Since it's now clear that in order to get the most understanding of text, you would ideally train the model on the data rather than connecting the data to an external database. But since training a model constantly (even ~once a month) isn't currently feasible, AI leaders are looking to build retrieval methods into the models them selves as part of the post-training process. These retrieval methods would likely emulate agentic behavior with retrieval over the entire open web. The agentic behavior would allow the base model to plan ahead given teh current query and naturally "think through the problem" rather than requiring the user to specify this. The fears of this can be seen in videos like this where people fear that OpenAI is going to release a new foundational model with capabilities like this. And these fears are validated to me by the fact that OpenAI is a very good company with very smart people. The reason this is important is because they haven't created a GPT-4.5 recently when they easily could have, but instead have been taking the punches from Sonnet 3-5 kicking their ass. Why would they allow this? A theory is because they know they have lost this battle but believe they will win the war. Meaning that they are creating an entirely new model with a different foundation for which they can continually build upon in the post training stages as briefly mentioned by Sam Altman on this podcast (exact minute not known, sorry!).

A base model that can reason through the query and think ahead to realize which data it should retrieve would be quite a powerful one. If it had a tool to retrieve up-to-date data in a way for it to consistently retrieve the many sources needed to answer a given question, it would be even more powerful. Now some people can put bets on either side of when and if this will happen. In the end, one side will be right and the other will be dead.

One thing remains for certain throughout this though: Private Data will remain a key part of RAG systems as businesses look to leverage their proprietary data for internal and external tools to give them an edge. But whether or not RAG apps who operate on publically available data will survive or not, I am not sure but I am not bullish on it.

Although Great Founders will undoubtedly force their way to success, it is a scary time to be operating on the edge in this space and I salute all of you!