Where Naive RAG Fails

Standard RAG works by embedding document chunks into vectors, retrieving the most similar chunks to a query, and feeding them to an LLM for generation. This works well for factoid questions where the answer exists in a single chunk. But it fails systematically on three types of queries:

Multi-hop reasoning: "Which suppliers of our top-selling product also supply our competitors?" -- requires connecting information across multiple documents.
Global summarization: "What are the main themes discussed across all board meeting transcripts?" -- requires aggregating information from the entire corpus.
Relationship queries: "How are the characters in this novel connected to each other?" -- requires understanding entity relationships, not just text similarity.

GraphRAG addresses these failures by building a knowledge graph from your documents and using graph traversal alongside vector search for retrieval.

How GraphRAG Works

The GraphRAG pipeline has two phases: indexing (building the knowledge graph) and querying (using the graph for retrieval).

Indexing Phase

Entity extraction: An LLM reads each document chunk and extracts entities (people, organizations, concepts, products) and relationships between them.
Graph construction: Extracted entities become nodes; relationships become edges. Duplicate entities are merged.
Community detection: Graph clustering algorithms (like Leiden) identify communities -- groups of densely connected entities.
Community summarization: An LLM generates a summary description for each community, capturing the key themes and relationships.

import networkx as nx
from anthropic import Anthropic

client = Anthropic()

async def extract_entities_and_relations(chunk: str) -> dict:
    """Use an LLM to extract structured knowledge from a text chunk"""
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="""Extract entities and relationships from the text.
Return JSON with this structure:
{
  "entities": [{"name": "...", "type": "person|org|concept|product", "description": "..."}],
  "relationships": [{"source": "...", "target": "...", "relation": "...", "description": "..."}]
}""",
        messages=[{"role": "user", "content": chunk}],
    )
    return json.loads(response.content[0].text)

def build_knowledge_graph(extractions: list[dict]) -> nx.Graph:
    """Build a NetworkX graph from extracted entities and relations"""
    G = nx.Graph()

    for extraction in extractions:
        for entity in extraction["entities"]:
            name = entity["name"].lower().strip()
            if G.has_node(name):
                # Merge descriptions for duplicate entities
                G.nodes[name]["descriptions"].append(entity["description"])
            else:
                G.add_node(name, type=entity["type"],
                          descriptions=[entity["description"]])

        for rel in extraction["relationships"]:
            source = rel["source"].lower().strip()
            target = rel["target"].lower().strip()
            G.add_edge(source, target,
                       relation=rel["relation"],
                       description=rel["description"])

    return G

Community Detection

import community as community_louvain  # python-louvain

def detect_communities(G: nx.Graph) -> dict:
    """Detect communities using Louvain algorithm"""
    partition = community_louvain.best_partition(G)

    # Group nodes by community
    communities = {}
    for node, comm_id in partition.items():
        if comm_id not in communities:
            communities[comm_id] = []
        communities[comm_id].append(node)

    return communities

async def summarize_community(G: nx.Graph, nodes: list[str]) -> str:
    """Generate a summary for a community of related entities"""
    # Collect all entity descriptions and relationships within the community
    context_parts = []
    for node in nodes:
        desc = "; ".join(G.nodes[node].get("descriptions", []))
        context_parts.append(f"Entity: {node} ({G.nodes[node].get('type', 'unknown')}): {desc}")

    for u, v, data in G.edges(data=True):
        if u in nodes and v in nodes:
            context_parts.append(
                f"Relationship: {u} --[{data['relation']}]--> {v}: {data.get('description', '')}"
            )

    context = "\n".join(context_parts)

    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"Summarize the key themes and relationships in this "
                       f"group of related entities:\n\n{context}"
        }],
    )
    return response.content[0].text

Query Strategies

GraphRAG supports two query modes that handle different question types:

Local Search (for specific questions)

Local search starts by finding relevant entities in the graph, then traverses their neighborhood to gather connected context:

async def local_search(query: str, G: nx.Graph, vector_store, top_k: int = 5):
    # Step 1: Extract entities from the query
    query_entities = await extract_query_entities(query)

    # Step 2: Find matching nodes in the graph
    matched_nodes = []
    for entity in query_entities:
        matches = find_similar_nodes(G, entity, threshold=0.8)
        matched_nodes.extend(matches)

    # Step 3: Traverse graph neighborhood (1-2 hops)
    context_nodes = set()
    for node in matched_nodes:
        context_nodes.add(node)
        for neighbor in G.neighbors(node):
            context_nodes.add(neighbor)
            # Optional: 2nd hop for deeper reasoning
            for neighbor2 in G.neighbors(neighbor):
                context_nodes.add(neighbor2)

    # Step 4: Gather context from graph
    graph_context = format_subgraph_context(G, context_nodes)

    # Step 5: Also retrieve from vector store for text chunks
    vector_results = await vector_store.search(query, top_k=top_k)

    # Step 6: Combine graph context + vector context for generation
    combined_context = f"Graph context:\n{graph_context}\n\nText context:\n{vector_results}"
    return combined_context

Global Search (for summarization questions)

Global search uses community summaries to answer questions that span the entire corpus:

async def global_search(query: str, community_summaries: list[str]):
    # Step 1: Score each community summary for relevance
    scored_summaries = []
    for summary in community_summaries:
        relevance = await score_relevance(query, summary)
        scored_summaries.append((summary, relevance))

    # Step 2: Select top community summaries
    scored_summaries.sort(key=lambda x: x[1], reverse=True)
    top_summaries = [s for s, _ in scored_summaries[:10]]

    # Step 3: Generate answer from community summaries
    context = "\n\n---\n\n".join(top_summaries)
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="Answer based on the provided community summaries. "
               "Cite specific communities when making claims.",
        messages=[{
            "role": "user",
            "content": f"Community summaries:\n{context}\n\nQuestion: {query}"
        }],
    )
    return response.content[0].text

GraphRAG vs Naive RAG: Benchmark Results

Microsoft Research's evaluation of GraphRAG on multi-hop questions shows significant improvements:

Query Type	Naive RAG (Correct %)	GraphRAG (Correct %)	Improvement
Single-hop factoid	82%	85%	+3%
Multi-hop reasoning	34%	72%	+38%
Global summarization	21%	68%	+47%
Relationship queries	29%	76%	+47%
Temporal reasoning	41%	63%	+22%

The improvement is most dramatic for the query types where naive RAG fundamentally cannot work: questions that require connecting information across multiple documents.

Implementation with Neo4j

For production GraphRAG, use a proper graph database like Neo4j:

from neo4j import AsyncGraphDatabase

class GraphRAGStore:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = AsyncGraphDatabase.driver(uri, auth=(user, password))

    async def store_entity(self, entity: dict):
        async with self.driver.session() as session:
            await session.run(
                """MERGE (e:Entity {name: $name})
                   SET e.type = $type, e.description = $description""",
                name=entity["name"],
                type=entity["type"],
                description=entity["description"],
            )

    async def store_relationship(self, rel: dict):
        async with self.driver.session() as session:
            await session.run(
                """MATCH (a:Entity {name: $source})
                   MATCH (b:Entity {name: $target})
                   MERGE (a)-[r:RELATED {type: $relation}]->(b)
                   SET r.description = $description""",
                source=rel["source"],
                target=rel["target"],
                relation=rel["relation"],
                description=rel["description"],
            )

    async def get_neighborhood(self, entity_name: str, hops: int = 2):
        async with self.driver.session() as session:
            result = await session.run(
                f"""MATCH path = (e:Entity {{name: $name}})-[*1..{hops}]-(related)
                    RETURN path""",
                name=entity_name,
            )
            return [record["path"] for record in await result.data()]

Cost and Complexity Tradeoffs

GraphRAG is significantly more expensive to build than naive RAG:

Aspect	Naive RAG	GraphRAG
Indexing cost (1M docs)	$50-100 (embedding)	$500-2000 (LLM extraction + embedding)
Indexing time	Hours	Days
Query latency	200-500ms	500-2000ms
Infrastructure	Vector DB	Vector DB + Graph DB
Maintenance complexity	Low	Medium-High
Update strategy	Easy incremental	Complex (entity resolution)

When to Use GraphRAG

Your queries frequently require connecting information across documents
Users ask global/summarization questions about large corpora
Relationship understanding is critical (legal, biomedical, intelligence analysis)
You can justify the higher indexing cost and infrastructure complexity

When Naive RAG Is Sufficient

Most queries are answered by a single document chunk
Your corpus is small enough that simple top-k retrieval works
Low latency is more important than multi-hop reasoning
Budget constraints prevent the additional LLM calls during indexing

Key Takeaways

GraphRAG represents a genuine advancement over naive RAG for complex queries. The knowledge graph structure enables multi-hop reasoning, relationship queries, and global summarization that vector-only retrieval cannot achieve. However, it comes with significantly higher indexing costs and infrastructure complexity. The right approach is to start with naive RAG, measure where it fails, and add GraphRAG capabilities specifically for the query types that need it.

GraphRAG: How Knowledge Graphs Beat Naive RAG for Complex Queries