I built my first RAG system in 2023. It was elegant. Vector embeddings, semantic search, chunking strategies. I felt like a real ML engineer.
Then I started using it in production and realized: most of the time, I didn't need it.
The RAG Obsession
RAG (Retrieval-Augmented Generation) has become the default answer to "how do I get my data into the model?"
Every AI startup is building RAG. Every blog post about AI architecture mentions RAG. It's become synonymous with "production AI."
But here's the thing: RAG is complex. It adds latency, costs money, introduces failure modes, and requires ongoing maintenance. And most of the time, it's overkill.
When RAG Makes Sense
RAG is valuable when:
- Your data exceeds context limits: If you have more relevant data than fits in the context window, you need to retrieve the relevant bits.
- Data freshness matters: If the model needs current information that wasn't in its training data.
- Specific attribution is required: When users need to know exactly which document informed the answer.
These are real use cases. I'm not dismissing them.
When Simple Prompts Win
But there's a simpler answer that most people skip:
Put the information in the prompt.
Modern models have huge context windows. They can absorb significant context. And prompts are free — no vector database, no embedding pipeline, no retrieval latency.
For many applications, you can just include the relevant information directly in the prompt. No retrieval needed.
A Practical Framework
Before building RAG, ask:
-
How much data do I actually need? If it's less than 100k tokens, just put it in the prompt.
-
How often does data change? If it's static or slowly changing, embedding and retrieval adds unnecessary complexity.
-
What's the latency requirement? RAG adds 100-500ms typically. Is that acceptable?
-
Can the model answer without my data? Sometimes the model already knows enough. Test first.
What We Do at Orochi
We use RAG selectively. For our knowledge base, it makes sense — too much information to include in prompts.
But for many features, we just engineer good prompts. The model is smart enough. The extra complexity isn't worth it.
The Bottom Line
RAG is a tool. It's not a requirement. It's not even always the best tool.
Before you reach for the vector database, try the simple solution first. You might be surprised how far prompts alone can go.
Simplicity wins. Add complexity only when necessary.