Skip to main content

RAG Architecture

3/14/2025Dishant Miyani2 min read

Providing context to the AI so that it gives out a magical response you want? That is now possible with RAG! (Retrieval Augmented Generation), upload your data source to AI and get better response every time.


What is RAG? #

In simple terms, RAG enhances large language models (LLMs) by integrating external knowledge sources, such as databases, documents, or even real-time web data, into their responses.

Traditional LLMs rely on static training data, RAG retrieves specific, relevant information before generating a response. This ensures outputs are more accurate, context-aware, and aligned with user intent.


Architecture of RAG #

Let’s go through 3 simple steps and understand the underlying principle:

1. Ingestion #

  • Data Source: Upload a document or provide a data source on which you want AI to give a better response
  • Chunks: Consider chunks as your data source broken down into small pieces
  • Embeddings: Embeddings are vector representation of the chunks that we had in previous steps
  • Vector DB: It is a type of database that stores the vectors, which are generated by embeddings, for future purpose

2. Retrieval #

  • Query Embedding: When a user asks a question, the query is converted into an embedding (vector representation)
  • Search in Vector DB: The query embedding is compared with stored embeddings in the Vector DB using similarity measures
  • Retrieve Relevant Chunks: Based on the similarity score, the most relevant document chunks are retrieved from the database

3. Augmentation and Generation #

Combines the retrieved chunks with the user query to create an augmented input. Passes this augmented input to a LLM, which generates a contextually rich and accurate response by using both its pre-trained knowledge and the retrieved information.


Why RAG Matters #

Using RAG provides proper context and unlocks true capabilities of AI and allows for accurate and relevant answers without long explanations. It bridges the gap between static AI knowledge and dynamic, real-world information needs.