LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA
Build end-to-end Retrieval Augmented Generation pipelines with LangChain — covering document loaders, text splitting strategies, vector stores, retrievers, and RAG chain composition.
What Is RAG and Why LangChain for It
Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation. Instead of relying solely on the model's training data, RAG fetches relevant documents from your own data source and includes them as context in the prompt. This lets the model answer questions about your specific documents, databases, or knowledge bases.
LangChain provides the full RAG pipeline as composable components: document loaders to ingest data, text splitters to chunk it, embedding models and vector stores to index it, retrievers to search it, and chain composition to wire it all together.
Step 1: Loading Documents
LangChain ships with loaders for dozens of formats — PDF, HTML, CSV, Markdown, databases, APIs, and more.
from langchain_community.document_loaders import (
TextLoader,
PyPDFLoader,
CSVLoader,
WebBaseLoader,
)
# Load a text file
text_docs = TextLoader("knowledge_base.txt").load()
# Load a PDF (one document per page)
pdf_docs = PyPDFLoader("annual_report.pdf").load()
# Load from a web page
web_docs = WebBaseLoader("https://docs.example.com/guide").load()
# Each returns a list of Document objects
print(text_docs[0].page_content[:200])
print(text_docs[0].metadata) # {"source": "knowledge_base.txt"}
Every loader returns Document objects with page_content (the text) and metadata (source, page number, etc.). Metadata flows through the entire pipeline, so your final answers can cite sources.
Step 2: Splitting Text into Chunks
Documents are often too large to fit in a single prompt. Text splitters break them into manageable chunks while preserving semantic coherence.
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Max characters per chunk
chunk_overlap=200, # Overlap between consecutive chunks
separators=["\n\n", "\n", ". ", " ", ""],
)
chunks = splitter.split_documents(pdf_docs)
print(f"Split {len(pdf_docs)} pages into {len(chunks)} chunks")
RecursiveCharacterTextSplitter is the recommended default. It tries to split on paragraph boundaries first, then sentences, then words, ensuring chunks are as semantically coherent as possible. The overlap ensures that information spanning a boundary appears in at least one chunk.
For code, use RecursiveCharacterTextSplitter.from_language():
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language
python_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.PYTHON,
chunk_size=1000,
chunk_overlap=100,
)
Step 3: Embedding and Indexing
Chunks are converted to vectors using an embedding model and stored in a vector store for similarity search.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create vector store from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)
# Or use Chroma for persistence
from langchain_chroma import Chroma
vectorstore = Chroma.from_documents(
chunks,
embeddings,
persist_directory="./chroma_db",
)
FAISS is fast and in-memory. Chroma persists to disk. For production, consider Pinecone, Weaviate, or pgvector for PostgreSQL.
Step 4: Building the Retriever
A retriever wraps the vector store and returns the most relevant documents for a query.
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 4}, # Return top 4 chunks
)
# Test the retriever
docs = retriever.invoke("What were Q3 revenue numbers?")
for doc in docs:
print(doc.page_content[:100])
print(doc.metadata)
print("---")
You can also use search_type="mmr" (Maximal Marginal Relevance) to get diverse results rather than just the closest matches.
Step 5: Composing the RAG Chain
Now connect everything into an LCEL chain that retrieves context and generates answers.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# Format retrieved documents into a single string
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
prompt = ChatPromptTemplate.from_template(
"""Answer the question based on the following context.
If the context does not contain enough information, say so.
Context:
{context}
Question: {question}
Answer:"""
)
rag_chain = (
{
"context": retriever | format_docs,
"question": RunnablePassthrough(),
}
| prompt
| ChatOpenAI(model="gpt-4o", temperature=0)
| StrOutputParser()
)
answer = rag_chain.invoke("What were the key highlights from Q3?")
print(answer)
The dictionary step runs the retriever and passthrough in parallel. Retrieved documents are formatted into a string, while the original question is forwarded. Both feed into the prompt template.
Adding Source Citations
To return sources alongside the answer, modify the chain to return both.
from langchain_core.runnables import RunnableParallel
rag_with_sources = RunnableParallel(
answer=rag_chain,
sources=retriever | (lambda docs: [d.metadata["source"] for d in docs]),
)
result = rag_with_sources.invoke("What were Q3 revenue numbers?")
print(result["answer"])
print("Sources:", result["sources"])
FAQ
How do I choose the right chunk size?
Start with 1000 characters and 200 overlap. Smaller chunks (500 characters) improve retrieval precision but may lose context. Larger chunks (2000 characters) retain more context but may dilute relevance. Test with your actual queries and documents, measuring retrieval quality.
Can I use RAG with local models instead of OpenAI?
Yes. Replace ChatOpenAI with any LangChain model wrapper — ChatOllama for local Ollama models, for example. For embeddings, use HuggingFaceEmbeddings or OllamaEmbeddings to keep everything local.
How do I update the vector store when my documents change?
Most vector stores support add_documents() to add new content. For updates, delete the old documents by ID and add the new versions. Chroma and Pinecone support upsert operations. For bulk reindexing, rebuild the vector store from scratch.
#LangChain #RAG #VectorStore #DocumentLoading #Python #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.