Knowledge Graph Agents: Combining Graph Databases with LLMs for Structured Reasoning

Why Knowledge Graphs Matter for AI Agents

LLMs are powerful pattern matchers but weak structured reasoners. Ask an LLM to trace the chain of ownership through five levels of subsidiaries, identify all products affected by a supply chain disruption, or find the shortest path between two researchers through co-authorship — and it will hallucinate or give up. These tasks require traversing explicit relationships across entities, which is exactly what knowledge graphs do.

A knowledge graph represents information as entities (nodes) connected by typed relationships (edges). Unlike vector databases that store chunks of unstructured text, knowledge graphs preserve the structure of information — who reports to whom, which component depends on which library, which drug interacts with which protein.

When you combine knowledge graphs with LLM-powered agents, you get systems that can reason over structured data with the flexibility of natural language. The agent translates user questions into graph queries, traverses relationships, and synthesizes answers that would be impossible with retrieval alone.

Knowledge Graph Fundamentals for Agent Developers

Before building the agent, you need a graph that encodes domain knowledge as triples: (subject, predicate, object). For example: (Tesla, manufactures, Model 3), (Model 3, has_battery, 4680 Cell), (4680 Cell, supplied_by, Panasonic).

Neo4j is the most mature graph database for production agent systems. It uses the Cypher query language and has native Python drivers with async support.

from neo4j import AsyncGraphDatabase
from dataclasses import dataclass

@dataclass
class Entity:
    id: str
    label: str
    properties: dict

@dataclass
class Relationship:
    source: str
    relation_type: str
    target: str
    properties: dict

class KnowledgeGraphClient:
    def __init__(self, uri: str, user: str, password: str):
        self.driver = AsyncGraphDatabase.driver(uri, auth=(user, password))

    async def query(self, cypher: str, params: dict = None) -> list[dict]:
        async with self.driver.session() as session:
            result = await session.run(cypher, params or {})
            return [record.data() async for record in result]

    async def get_entity_neighbors(
        self, entity_name: str, max_depth: int = 2
    ) -> list[dict]:
        cypher = """
        MATCH path = (e {name: $name})-[*1..""" + str(max_depth) + """]->(related)
        RETURN e.name AS source,
               [r IN relationships(path) | type(r)] AS relations,
               related.name AS target,
               labels(related) AS target_labels
        LIMIT 50
        """
        return await self.query(cypher, {"name": entity_name})

    async def find_path(
        self, source: str, target: str, max_hops: int = 5
    ) -> list[dict]:
        cypher = """
        MATCH path = shortestPath(
            (a {name: $source})-[*1..""" + str(max_hops) + """]->(b {name: $target})
        )
        RETURN [n IN nodes(path) | n.name] AS node_names,
               [r IN relationships(path) | type(r)] AS relationship_types
        """
        return await self.query(cypher, {"source": source, "target": target})

    async def close(self):
        await self.driver.close()

Entity Extraction: Populating the Graph

A knowledge graph is only as useful as the data it contains. For agent systems, you typically populate the graph from unstructured documents using LLM-based entity and relationship extraction.

from pydantic import BaseModel, Field
from langchain_openai import ChatOpenAI

class ExtractedTriple(BaseModel):
    subject: str = Field(description="The source entity")
    subject_type: str = Field(description="Entity type (Person, Company, Product, etc.)")
    predicate: str = Field(description="The relationship type")
    object: str = Field(description="The target entity")
    object_type: str = Field(description="Entity type of the target")
    confidence: float = Field(description="Confidence score 0-1")

class ExtractionResult(BaseModel):
    triples: list[ExtractedTriple]

EXTRACTION_PROMPT = """Extract all entity-relationship triples from the text.
Focus on: people, organizations, products, technologies, locations.
Relationship types: works_at, founded, manufactures, competes_with,
partners_with, acquired, invested_in, located_in, uses_technology.

Only extract relationships explicitly stated or strongly implied.
Assign confidence scores: 1.0 for explicit, 0.7 for strongly implied.

Text: {text}"""

async def extract_triples(text: str, llm: ChatOpenAI) -> list[ExtractedTriple]:
    extractor = llm.with_structured_output(ExtractionResult)
    result = await extractor.ainvoke(
        EXTRACTION_PROMPT.format(text=text)
    )
    return [t for t in result.triples if t.confidence >= 0.7]

async def ingest_triples(
    graph: KnowledgeGraphClient, triples: list[ExtractedTriple]
):
    for triple in triples:
        cypher = """
        MERGE (s {name: $subject})
        ON CREATE SET s:""" + triple.subject_type + """
        MERGE (o {name: $object})
        ON CREATE SET o:""" + triple.object_type + """
        MERGE (s)-[r:""" + triple.predicate.upper() + """]->(o)
        SET r.confidence = $confidence
        """
        await graph.query(cypher, {
            "subject": triple.subject,
            "object": triple.object,
            "confidence": triple.confidence,
        })

Building the Knowledge Graph Agent

The agent needs tools that translate natural language into graph operations. The key tools are: entity lookup, neighbor exploration, path finding, and pattern matching.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

from langchain.tools import tool

graph = KnowledgeGraphClient(
    "bolt://localhost:7687", "neo4j", "password"
)

@tool
async def lookup_entity(name: str) -> str:
    """Find an entity in the knowledge graph and return its properties
    and immediate connections."""
    neighbors = await graph.get_entity_neighbors(name, max_depth=1)
    if not neighbors:
        return f"No entity found for '{name}'"
    lines = [f"Entity: {name}"]
    for n in neighbors:
        lines.append(
            f"  --[{', '.join(n['relations'])}]--> {n['target']} ({', '.join(n['target_labels'])})"
        )
    return "
".join(lines)

@tool
async def find_connection(source: str, target: str) -> str:
    """Find the shortest path between two entities in the
    knowledge graph."""
    paths = await graph.find_path(source, target)
    if not paths:
        return f"No connection found between '{source}' and '{target}'"
    path = paths[0]
    chain = []
    for i, node in enumerate(path["node_names"]):
        chain.append(node)
        if i < len(path["relationship_types"]):
            chain.append(f"--[{path['relationship_types'][i]}]-->")
    return " ".join(chain)

@tool
async def run_graph_query(cypher_query: str) -> str:
    """Execute a Cypher query against the knowledge graph.
    Use this for complex graph patterns that the other tools
    cannot handle."""
    try:
        results = await graph.query(cypher_query)
        return str(results[:10])
    except Exception as e:
        return f"Query error: {str(e)}"

KG_AGENT_PROMPT = """You are an AI agent with access to a knowledge graph.
Use graph tools to answer questions about entities, relationships, and
connections. When answering:

1. Start by looking up relevant entities
2. Explore their connections to gather context
3. Use path finding for relationship questions
4. Only use raw Cypher queries for complex patterns

Always ground your answers in the graph data you retrieve.
If the graph does not contain the answer, say so explicitly."""

Graph-Augmented Generation

The most powerful pattern combines knowledge graph retrieval with traditional RAG. The graph provides structured context (relationships, hierarchies, connections) while the vector store provides unstructured context (detailed descriptions, recent news, documentation). The agent weaves both into its response.

class GraphRAGAgent:
    def __init__(self, graph: KnowledgeGraphClient, vector_store, llm):
        self.graph = graph
        self.vector_store = vector_store
        self.llm = llm

    async def answer(self, question: str) -> str:
        # Step 1: Extract entities from the question
        entities = await self._extract_question_entities(question)

        # Step 2: Get graph context (structured)
        graph_context = []
        for entity in entities:
            neighbors = await self.graph.get_entity_neighbors(entity, max_depth=2)
            graph_context.extend(neighbors)

        # Step 3: Get vector context (unstructured)
        vector_results = self.vector_store.similarity_search(question, k=5)
        text_context = "
".join(doc.page_content for doc in vector_results)

        # Step 4: Synthesize answer
        prompt = f"""Answer the question using both structured graph data
and unstructured text context.

Graph relationships:
{self._format_graph_context(graph_context)}

Text context:
{text_context}

Question: {question}"""

        response = await self.llm.ainvoke(prompt)
        return response.content

    async def _extract_question_entities(self, question: str) -> list[str]:
        response = await self.llm.ainvoke(
            f"Extract entity names from this question. "
            f"Return only a comma-separated list: {question}"
        )
        return [e.strip() for e in response.content.split(",")]

    def _format_graph_context(self, neighbors: list[dict]) -> str:
        lines = []
        for n in neighbors:
            lines.append(
                f"{n['source']} --[{', '.join(n['relations'])}]--> {n['target']}"
            )
        return "
".join(lines)

Production Tips for Knowledge Graph Agents

Keep the graph schema tight. In production, an unconstrained graph quickly becomes a tangled mess where every entity connects to everything. Define a clear ontology with specific node labels and relationship types. Enforce it during ingestion by validating extracted triples against allowed types.

Version your graph. Use timestamped relationships or snapshot nodes so the agent can answer questions about how relationships changed over time. This is critical for compliance and audit-trail use cases.

Index strategically. Neo4j supports full-text indexes and composite indexes on node properties. Create indexes on every property you use in MATCH or WHERE clauses. Without indexes, graph queries degrade from milliseconds to seconds as the graph grows.

FAQ

How does a knowledge graph agent differ from standard RAG?

Standard RAG retrieves chunks of text based on semantic similarity — it finds passages that are about the same topic as the query. Knowledge graph agents traverse explicit relationships between entities — they can follow chains of connections, find shortest paths, and aggregate structured attributes. The key advantage is multi-hop reasoning: questions like "which suppliers are shared between our top 3 competitors" require traversing relationships that RAG simply cannot resolve from text chunks alone.

What size of knowledge graph is practical for an agent system?

Neo4j comfortably handles graphs with tens of millions of nodes and hundreds of millions of relationships on a single server. For agent use cases, graphs between 100K and 10M nodes are the sweet spot — large enough to contain meaningful knowledge, small enough for sub-second query times without extensive tuning. The critical factor is not node count but query complexity: deep traversals (more than 4 hops) can become expensive regardless of graph size, so design your schema to minimize required hops.

Should I build my own knowledge graph or use an existing one like Wikidata?

For domain-specific agents, build your own. Wikidata and DBpedia are valuable for general-knowledge enrichment (adding company details, geographic information, or public facts), but they lack the domain-specific relationships that make agents useful. The recommended approach is to build a domain graph from your own data and enrich it with select properties from public knowledge graphs where relevant.

How do I keep the knowledge graph up to date?

Implement a continuous ingestion pipeline that processes new documents through entity extraction and triple generation. Use MERGE operations in Neo4j (not CREATE) to avoid duplicates. Run a periodic reconciliation job that detects and resolves conflicting triples. For time-sensitive domains, add a timestamp to every relationship and filter queries to use only recent data by default.

#KnowledgeGraphs #Neo4j #GraphRAG #AIAgents #StructuredReasoning #EntityExtraction #GraphDatabases #LLM

Knowledge Graph Agents: Combining Graph Databases with LLMs for Structured Reasoning

Why Knowledge Graphs Matter for AI Agents

Knowledge Graph Fundamentals for Agent Developers

Entity Extraction: Populating the Graph

Building the Knowledge Graph Agent

Graph-Augmented Generation

Production Tips for Knowledge Graph Agents

FAQ

How does a knowledge graph agent differ from standard RAG?

What size of knowledge graph is practical for an agent system?

Should I build my own knowledge graph or use an existing one like Wikidata?

How do I keep the knowledge graph up to date?

Try CallSphere AI Voice Agents

Related Articles

Evaluating AI Pipelines: From LLMs to Real-World Impact

Fine-Tuning LLMs for Agentic Tasks: When and How to Customize Foundation Models

Agent A/B Testing: Comparing Model Versions, Prompts, and Architectures in Production