LlamaIndex Agents: RAG-First Agent Architecture for Knowledge-Intensive Tasks

Why RAG and Agents Belong Together

Retrieval-augmented generation (RAG) gives LLMs access to external knowledge. Agents give LLMs the ability to take actions and reason over multiple steps. Separately, each has limitations: RAG pipelines cannot reason about which documents to retrieve or how to combine information from multiple sources, and agents without retrieval hallucinate when asked knowledge-intensive questions.

LlamaIndex was built from the ground up for RAG. Its agent layer extends this foundation by letting agents treat query engines, indexes, and retrieval pipelines as tools. The result is agents that are genuinely good at knowledge-intensive tasks — not just tool-calling agents with a vector store bolted on.

Query Engines as Agent Tools

The core pattern in LlamaIndex agents is wrapping query engines as tools. A query engine encapsulates an index, a retriever, and a response synthesizer. When you give this to an agent as a tool, the agent can decide when and how to query your data:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

# Load and index documents
documents = SimpleDirectoryReader("./data/financial_reports").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=5)

# Wrap the query engine as an agent tool
finance_tool = QueryEngineTool(
    query_engine=query_engine,
    metadata=ToolMetadata(
        name="financial_reports",
        description="""Useful for answering questions about quarterly
        financial reports, revenue figures, and earnings data.
        Input should be a specific financial question.""",
    ),
)

# Create an agent with the query engine tool
llm = OpenAI(model="gpt-4o")
agent = ReActAgent.from_tools(
    tools=[finance_tool],
    llm=llm,
    verbose=True,
)

response = agent.chat("What was the Q3 2025 revenue growth rate?")

The agent receives a question, decides whether to use the financial reports tool, formulates a retrieval query, gets the relevant chunks, and synthesizes an answer. The ReAct pattern lets it do this iteratively — if the first retrieval does not answer the question, the agent can reformulate and try again.

Multi-Index Agents

Real applications often have multiple data sources. LlamaIndex agents handle this by accepting multiple query engine tools:

# Second index for a different data source
policy_docs = SimpleDirectoryReader("./data/company_policies").load_data()
policy_index = VectorStoreIndex.from_documents(policy_docs)
policy_engine = policy_index.as_query_engine(similarity_top_k=3)

policy_tool = QueryEngineTool(
    query_engine=policy_engine,
    metadata=ToolMetadata(
        name="company_policies",
        description="""Useful for questions about company policies,
        HR guidelines, compliance requirements, and internal procedures.""",
    ),
)

# Agent with multiple knowledge sources
agent = ReActAgent.from_tools(
    tools=[finance_tool, policy_tool],
    llm=llm,
    verbose=True,
)

# The agent decides which tool to query
response = agent.chat(
    "Does our remote work policy affect the Q3 headcount projections?"
)

For this question, the agent might first query the company policies tool to understand the remote work policy, then query the financial reports tool for headcount projections, and finally synthesize both results into a coherent answer. This multi-step reasoning across data sources is where LlamaIndex agents excel.

Sub-Question Query Engine

For complex questions that span multiple sources, LlamaIndex offers a specialized pattern — the sub-question query engine. Instead of an agent loop, it decomposes a question into sub-questions and routes each to the appropriate query engine:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine import SubQuestionQueryEngine

sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[finance_tool, policy_tool],
    llm=llm,
)

response = sub_question_engine.query(
    "Compare our Q3 hiring costs against the approved budget in the HR policy."
)

This approach is more deterministic than the agent loop and works well when you know the question will require information from multiple sources.

Data Agents for Structured Data

LlamaIndex also supports agents that work with structured data through SQL and pandas integrations:

from llama_index.core import SQLDatabase
from llama_index.core.query_engine import NLTOSQLQueryEngine
from sqlalchemy import create_engine

engine = create_engine("postgresql://user:pass@localhost/analytics")
sql_database = SQLDatabase(engine, include_tables=["revenue", "expenses"])

sql_tool = QueryEngineTool(
    query_engine=NLTOSQLQueryEngine.from_defaults(
        sql_database=sql_database, llm=llm
    ),
    metadata=ToolMetadata(
        name="analytics_db",
        description="Query the analytics database for revenue and expense data.",
    ),
)

This lets agents write and execute SQL queries against your database, combining structured data retrieval with unstructured document retrieval in a single agent.

When to Choose LlamaIndex Agents

Choose LlamaIndex when your primary use case is knowledge-intensive — answering questions over documents, databases, or a combination. If your agents spend most of their time retrieving and synthesizing information rather than taking external actions, LlamaIndex's RAG-first design gives you better retrieval quality with less work.

For agents focused on external API calls, multi-agent orchestration, or code execution, other frameworks may be a better fit.

FAQ

How does LlamaIndex handle large document collections at scale?

LlamaIndex integrates with production vector stores like Pinecone, Weaviate, Qdrant, and ChromaDB. For large collections, you build the index once, persist it to the vector store, and load it on startup. The agent queries the vector store directly, so retrieval scales with the vector database.

Can LlamaIndex agents use non-RAG tools?

Yes. You can add any callable function as a FunctionTool alongside query engine tools. Agents can mix RAG tools with API calls, calculations, or any custom logic.

What is the difference between ReActAgent and the OpenAI agent in LlamaIndex?

ReActAgent uses the ReAct prompting pattern (Reason + Act) and works with any LLM. The OpenAI agent uses OpenAI's native function-calling API, which is more reliable for tool selection but only works with OpenAI models.

#LlamaIndex #RAG #AgentFrameworks #KnowledgeAgents #Python #AgenticAI #LearnAI #AIEngineering

LlamaIndex Agents: RAG-First Agent Architecture for Knowledge-Intensive Tasks

Why RAG and Agents Belong Together

Query Engines as Agent Tools

Multi-Index Agents

Sub-Question Query Engine

Data Agents for Structured Data

When to Choose LlamaIndex Agents

FAQ

How does LlamaIndex handle large document collections at scale?

Can LlamaIndex agents use non-RAG tools?

What is the difference between ReActAgent and the OpenAI agent in LlamaIndex?

Try CallSphere AI Voice Agents

Related Articles

WebArena and Real-World Web Agent Benchmarks: How We Measure Browser Agent Performance

Taking Screenshots and Recording Videos with Playwright for AI Analysis

Playwright Selectors Deep Dive: CSS, XPath, Text, and Role-Based Element Finding