You have a Django app. You want to add AI-powered search or a chatbot that actually knows about your data — not just a generic LLM that hallucinates answers. The solution is RAG: Retrieval-Augmented Generation.
This guide walks through exactly how we add RAG to existing Django projects. No rebuilding from scratch. No switching stacks. Just a clean integration on top of what you already have.
What RAG actually does
A plain LLM call like GPT-4 or Claude knows only what it was trained on. Ask it about your product catalogue, your customers, or your internal docs — and it will either make something up or tell you it doesn't know.
RAG fixes this by retrieving relevant chunks of your actual data and injecting them into the prompt as context. The model then answers based on that context, not just training data.
The flow looks like this:
- User asks a question
- Your app converts the question to a vector embedding
- You search your vector store for the most similar chunks of your data
- Those chunks get injected into the LLM prompt as context
- The LLM answers based on your data
What you need
- A Django app with a PostgreSQL database (we'll use
pgvector) - An OpenAI API key (for embeddings and completions)
-
pgvectorPostgres extension installed - Python packages:
openai,pgvector,django
We use pgvector here because it runs inside your existing Postgres instance — no separate vector database to manage or pay for. For larger scale, Pinecone or Chroma are worth evaluating, but for most Django projects pgvector is the right call.
Step 1: Install pgvector
pip install pgvector openai
Enable the extension in Postgres:
CREATE EXTENSION IF NOT EXISTS vector;
Step 2: Add a vector field to your model
# models.py
from django.db import models
from pgvector.django import VectorField
class Document(models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
embedding = VectorField(dimensions=1536, null=True, blank=True)
created_at = models.DateTimeField(auto_now_add=True)
def __str__(self):
return self.title
Run your migration:
python manage.py makemigrations
python manage.py migrate
Step 3: Generate and store embeddings
# utils/embeddings.py
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
Hook this into a Django signal:
# signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from .models import Document
from .utils.embeddings import get_embedding
@receiver(post_save, sender=Document)
def generate_embedding(sender, instance, created, **kwargs):
if created or not instance.embedding:
embedding = get_embedding(instance.content)
Document.objects.filter(pk=instance.pk).update(embedding=embedding)
Step 4: Semantic search with pgvector
# utils/search.py
from pgvector.django import CosineDistance
from .models import Document
from .utils.embeddings import get_embedding
def semantic_search(query: str, top_k: int = 5) -> list[Document]:
query_embedding = get_embedding(query)
results = (
Document.objects
.annotate(distance=CosineDistance('embedding', query_embedding))
.order_by('distance')[:top_k]
)
return list(results)
Step 5: Build the RAG response
# utils/rag.py
from openai import OpenAI
from .search import semantic_search
client = OpenAI()
def rag_response(user_query: str) -> dict:
relevant_docs = semantic_search(user_query, top_k=4)
context = "\n\n---\n\n".join([
f"Title: {doc.title}\n{doc.content}"
for doc in relevant_docs
])
system_prompt = """You are a helpful assistant. Answer the user's question
using ONLY the context provided below. If the answer is not in the context,
say so clearly — do not make up information.
Context:
{context}""".format(context=context)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
],
temperature=0.2
)
return {
"answer": response.choices[0].message.content,
"sources": [{"id": doc.id, "title": doc.title} for doc in relevant_docs]
}
Step 6: Wire it up as a Django view
# views.py
import json
from django.http import JsonResponse
from django.views.decorators.http import require_POST
from django.views.decorators.csrf import csrf_exempt
from .utils.rag import rag_response
@require_POST
@csrf_exempt
def ask(request):
data = json.loads(request.body)
query = data.get("query", "").strip()
if not query:
return JsonResponse({"error": "query is required"}, status=400)
result = rag_response(query)
return JsonResponse(result)
Test it:
curl -X POST http://localhost:8000/api/ask/ \
-H "Content-Type: application/json" \
-d '{"query": "What is your refund policy?"}'
Things to watch in production
Chunking strategy matters. Split long documents into 300–500 token chunks with 50–100 token overlap before embedding. Add a DocumentChunk model alongside Document for this.
Cache embeddings aggressively. Never regenerate an embedding for content that hasn't changed.
Monitor retrieval quality. Log cosine distance scores. If the top result has a distance above ~0.4, handle this gracefully rather than feeding irrelevant context to the LLM.
Rate limit your LLM calls. Use the tenacity library for retry logic with exponential backoff.
What this gets you
A working RAG pipeline on your existing Django app, using your existing Postgres database, with no new infrastructure. The whole integration is around 100 lines of code.
At Lycore we build AI integrations on top of existing Django, React, and .NET applications. If you're looking to add RAG, agents, or other AI capabilities to your stack, get in touch.
Top comments (0)