LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

What Is RAG and Why LangChain for It

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation. Instead of relying solely on the model's training data, RAG fetches relevant documents from your own data source and includes them as context in the prompt. This lets the model answer questions about your specific documents, databases, or knowledge bases.

LangChain provides the full RAG pipeline as composable components: document loaders to ingest data, text splitters to chunk it, embedding models and vector stores to index it, retrievers to search it, and chain composition to wire it all together.

Step 1: Loading Documents

LangChain ships with loaders for dozens of formats — PDF, HTML, CSV, Markdown, databases, APIs, and more.

from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    CSVLoader,
    WebBaseLoader,
)

# Load a text file
text_docs = TextLoader("knowledge_base.txt").load()

# Load a PDF (one document per page)
pdf_docs = PyPDFLoader("annual_report.pdf").load()

# Load from a web page
web_docs = WebBaseLoader("https://docs.example.com/guide").load()

# Each returns a list of Document objects
print(text_docs[0].page_content[:200])
print(text_docs[0].metadata)  # {"source": "knowledge_base.txt"}

Every loader returns Document objects with page_content (the text) and metadata (source, page number, etc.). Metadata flows through the entire pipeline, so your final answers can cite sources.

Step 2: Splitting Text into Chunks

Documents are often too large to fit in a single prompt. Text splitters break them into manageable chunks while preserving semantic coherence.

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Max characters per chunk
    chunk_overlap=200,     # Overlap between consecutive chunks
    separators=["\n\n", "\n", ". ", " ", ""],
)

chunks = splitter.split_documents(pdf_docs)
print(f"Split {len(pdf_docs)} pages into {len(chunks)} chunks")

RecursiveCharacterTextSplitter is the recommended default. It tries to split on paragraph boundaries first, then sentences, then words, ensuring chunks are as semantically coherent as possible. The overlap ensures that information spanning a boundary appears in at least one chunk.

For code, use RecursiveCharacterTextSplitter.from_language():

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)

Step 3: Embedding and Indexing

Chunks are converted to vectors using an embedding model and stored in a vector store for similarity search.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)

# Or use Chroma for persistence
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory="./chroma_db",
)

FAISS is fast and in-memory. Chroma persists to disk. For production, consider Pinecone, Weaviate, or pgvector for PostgreSQL.

Step 4: Building the Retriever

A retriever wraps the vector store and returns the most relevant documents for a query.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4},  # Return top 4 chunks
)

# Test the retriever
docs = retriever.invoke("What were Q3 revenue numbers?")
for doc in docs:
    print(doc.page_content[:100])
    print(doc.metadata)
    print("---")

You can also use search_type="mmr" (Maximal Marginal Relevance) to get diverse results rather than just the closest matches.

Step 5: Composing the RAG Chain

Now connect everything into an LCEL chain that retrieves context and generates answers.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Format retrieved documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template(
    """Answer the question based on the following context.
If the context does not contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)

answer = rag_chain.invoke("What were the key highlights from Q3?")
print(answer)

The dictionary step runs the retriever and passthrough in parallel. Retrieved documents are formatted into a string, while the original question is forwarded. Both feed into the prompt template.

Adding Source Citations

To return sources alongside the answer, modify the chain to return both.

from langchain_core.runnables import RunnableParallel

rag_with_sources = RunnableParallel(
    answer=rag_chain,
    sources=retriever | (lambda docs: [d.metadata["source"] for d in docs]),
)

result = rag_with_sources.invoke("What were Q3 revenue numbers?")
print(result["answer"])
print("Sources:", result["sources"])

FAQ

How do I choose the right chunk size?

Start with 1000 characters and 200 overlap. Smaller chunks (500 characters) improve retrieval precision but may lose context. Larger chunks (2000 characters) retain more context but may dilute relevance. Test with your actual queries and documents, measuring retrieval quality.

Can I use RAG with local models instead of OpenAI?

Yes. Replace ChatOpenAI with any LangChain model wrapper — ChatOllama for local Ollama models, for example. For embeddings, use HuggingFaceEmbeddings or OllamaEmbeddings to keep everything local.

How do I update the vector store when my documents change?

Most vector stores support add_documents() to add new content. For updates, delete the old documents by ID and add the new versions. Chroma and Pinecone support upsert operations. For bulk reindexing, rebuild the vector store from scratch.

#LangChain #RAG #VectorStore #DocumentLoading #Python #AgenticAI #LearnAI #AIEngineering

LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

What Is RAG and Why LangChain for It

Step 1: Loading Documents

Step 2: Splitting Text into Chunks

Step 3: Embedding and Indexing

Step 4: Building the Retriever

Step 5: Composing the RAG Chain

Adding Source Citations

FAQ

How do I choose the right chunk size?

Can I use RAG with local models instead of OpenAI?

How do I update the vector store when my documents change?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding