LlamaIndex property graph RAG lets you extract structured entity-relationship data from documents and query it with graph traversal — not just cosine similarity. The result is more precise answers on multi-hop questions that vector search consistently fails.

This tutorial builds a full Knowledge Graph RAG pipeline: extract a property graph from raw text, store it in Neo4j, and query it with LlamaIndex's graph retrievers. Tested on Python 3.12, LlamaIndex 0.10.x, and Neo4j 5.x.

You'll learn:

How PropertyGraphIndex extracts entities and relations from documents
How to connect LlamaIndex to a Neo4j backend for persistent graph storage
How to switch between keyword, vector, and Cypher graph retrievers
When graph RAG outperforms — and when it doesn't

Time: 25 min | Difficulty: Intermediate

Why Vector RAG Misses Multi-Hop Questions

Standard RAG embeds document chunks and retrieves the top-k closest chunks at query time. That works well when the answer lives in a single passage. It breaks down when the answer requires connecting facts across multiple documents or entities.

Example failure case:

"Which research papers co-authored by the same lab as Alice Wang also cite the BERT architecture?"

A vector search returns chunks mentioning Alice Wang or BERT — not the chain of relationships that links them. A property graph stores (Alice Wang)-[:AFFILIATED_WITH]->(MIT CSAIL)-[:PUBLISHED]->(Paper X)-[:CITES]->(BERT) and can traverse it exactly.

Symptoms that signal you need graph RAG:

Answers require connecting 3+ entities across documents
Users ask "who else worked on X" or "what else is related to Y"
Your vector RAG hallucinates relationship details it can't ground in a single chunk

Architecture: LlamaIndex Property Graph RAG

LlamaIndex Property Graph RAG architecture: document ingestion, entity extraction, Neo4j storage, graph retrieval End-to-end flow: documents → LLM entity extractor → property graph → Neo4j → graph retriever → LLM answer

The pipeline has four stages:

Ingestion — Load raw documents with SimpleDirectoryReader
Extraction — SchemaLLMPathExtractor calls an LLM to pull (entity, relation, entity) triples
Storage — Neo4jPropertyGraphStore persists nodes and edges; embeddings stored alongside for hybrid retrieval
Query — PropertyGraphIndex.as_retriever() traverses the graph, optionally mixing vector similarity and Cypher queries

Prerequisites

You need:

Python 3.12
A running Neo4j 5.x instance (Docker instructions below)
An OpenAI API key (or swap in any LlamaIndex-compatible LLM)
uv for dependency management (faster than pip)

Start Neo4j with Docker

# WHY --env NEO4J_AUTH: disables auth for local dev; set a password in production
docker run \
  --name neo4j-rag \
  -p 7474:7474 -p 7687:7687 \
  --env NEO4J_AUTH=none \
  --env NEO4J_PLUGINS='["apoc"]' \
  neo4j:5.18

Open http://localhost:7474 to confirm the Neo4j Browser loads.

Install Dependencies

uv init llamaindex-kg-rag
cd llamaindex-kg-rag

# WHY llama-index-graph-stores-neo4j: property graph Neo4j backend, not included in core
uv add llama-index-core \
       llama-index-llms-openai \
       llama-index-embeddings-openai \
       llama-index-graph-stores-neo4j

Step 1: Load Documents

# ingest.py
from llama_index.core import SimpleDirectoryReader

# WHY show_progress=True: extraction is slow (LLM per chunk); track it
documents = SimpleDirectoryReader(
    input_dir="./data",
    required_exts=[".txt", ".pdf", ".md"],
).load_data(show_progress=True)

print(f"Loaded {len(documents)} documents")

Drop any .txt, .pdf, or .md files into ./data/. For this walkthrough, use 10–20 Wikipedia article exports or research paper abstracts — small enough to extract in under 5 minutes with GPT-4o-mini.

Expected output: Loaded 14 documents

Step 2: Configure the LLM and Embedding Model

# settings.py
import os
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

os.environ["OPENAI_API_KEY"] = "sk-..."  # or load from .env

# WHY gpt-4o-mini for extraction: cheaper than gpt-4o; accurate enough for triple extraction
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0)

# WHY text-embedding-3-small: 1536-dim vectors at $0.02/1M tokens vs $0.13 for large
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

If you want a fully local setup, swap in llama-index-llms-ollama with llama3.3:70b and llama-index-embeddings-ollama with nomic-embed-text. Extraction will be slower but costs $0.

Step 3: Define the Graph Schema

# schema.py
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from typing import Literal

# WHY explicit entity types: constrains the LLM; prevents "Organization" and "Org" as duplicates
EntityType = Literal[
    "PERSON", "ORGANIZATION", "LOCATION",
    "RESEARCH_PAPER", "CONCEPT", "TECHNOLOGY"
]

# WHY explicit relation types: forces consistent edge labels in the graph
RelationType = Literal[
    "AUTHORED_BY", "AFFILIATED_WITH", "CITES",
    "DEVELOPED_BY", "LOCATED_IN", "RELATED_TO"
]

kg_extractor = SchemaLLMPathExtractor(
    llm=Settings.llm,
    possible_entities=EntityType,
    possible_relations=RelationType,
    # WHY num_workers=4: parallel chunk extraction; cap at 4 to avoid rate limits
    num_workers=4,
    max_triplets_per_chunk=10,
)

Defining possible_entities and possible_relations is the most impactful tuning step. Vague schemas produce noisy graphs. Tight schemas produce clean, traversable ones.

Step 4: Connect Neo4j Storage

# storage.py
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

graph_store = Neo4jPropertyGraphStore(
    username="neo4j",
    password="",         # empty string = auth disabled
    url="bolt://localhost:7687",
    database="neo4j",
    # WHY refresh_schema=True: re-reads Neo4j schema on init; needed after schema changes
    refresh_schema=True,
)

If connection fails:

ServiceUnavailable: Failed to establish connection → Neo4j container isn't running; check docker ps
AuthError → You set NEO4J_AUTH=none but passed a password; set password=""

Step 5: Build the PropertyGraphIndex

# build_index.py
from llama_index.core import PropertyGraphIndex
from ingest import documents
from schema import kg_extractor
from storage import graph_store
from settings import Settings  # noqa: F401 — applies global settings

index = PropertyGraphIndex.from_documents(
    documents,
    kg_extractors=[kg_extractor],
    property_graph_store=graph_store,
    # WHY show_progress=True: extraction takes 1–3 sec per chunk
    show_progress=True,
)

# Persist index metadata (node IDs, chunk map) separately from Neo4j
index.storage_context.persist(persist_dir="./storage")
print("Index built and persisted.")

Expected output:

Extracting paths from text: 100%|████████| 14/14 [02:41<00:00, 11.5s/it]
Index built and persisted.

Open Neo4j Browser at http://localhost:7474 and run MATCH (n) RETURN n LIMIT 50 to see your entity nodes and relationship edges rendered as a graph.

Step 6: Query the Property Graph

LlamaIndex ships three retriever strategies for property graphs. Each fits a different query type.

6a: Keyword-Based Graph Retrieval (Fast, No LLM Cost)

# query_keyword.py
from llama_index.core import PropertyGraphIndex, StorageContext, load_index_from_storage
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore
from llama_index.core.indices.property_graph import LLMSynonymRetriever

graph_store = Neo4jPropertyGraphStore(url="bolt://localhost:7687", username="neo4j", password="")
storage_context = StorageContext.from_defaults(persist_dir="./storage", property_graph_store=graph_store)
index = load_index_from_storage(storage_context)

# WHY LLMSynonymRetriever: expands query terms to synonyms, then matches entity names in graph
retriever = index.as_retriever(
    sub_retrievers=[
        LLMSynonymRetriever(
            index.property_graph_store,
            llm=Settings.llm,
            # WHY include_text=True: returns source chunk alongside graph triples
            include_text=True,
        )
    ]
)

nodes = retriever.retrieve("What organizations did Alice Wang collaborate with?")
for node in nodes:
    print(node.text)

6b: Vector Graph Retrieval (Semantic Match)

from llama_index.core.indices.property_graph import VectorContextRetriever

# WHY VectorContextRetriever: embeds query, finds nearest entity nodes by vector similarity
retriever = index.as_retriever(
    sub_retrievers=[
        VectorContextRetriever(
            index.property_graph_store,
            embed_model=Settings.embed_model,
            # WHY include_text=True: augments graph triples with surrounding chunk text
            include_text=True,
        )
    ]
)

6c: Cypher Query Retrieval (Multi-Hop, Most Powerful)

from llama_index.core.indices.property_graph import CypherTemplateRetriever
import re

# WHY CypherTemplateRetriever: translates natural language to Cypher; handles 3+ hop queries
cypher_retriever = CypherTemplateRetriever(
    index.property_graph_store,
    cypher_query_corrector=None,
)

nodes = cypher_retriever.retrieve(
    "Which papers cite the same technology as papers co-authored by Alice Wang?"
)
for node in nodes:
    print(node.text)

6d: Full Query Engine with Answer Synthesis

# query_engine.py
query_engine = index.as_query_engine(include_text=True)

response = query_engine.query(
    "What research organizations are connected to transformer architecture development?"
)
print(response)

Expected output:

Based on the knowledge graph, transformer architecture development is connected to
Google Brain (via the 'Attention Is All You Need' paper), University of Toronto
(Hinton affiliation), and OpenAI (GPT series). Google Brain and OpenAI share...

Step 7: Load Existing Index Without Re-Extracting

Re-extraction costs money and time. Always reload from storage after the first build.

# reload.py
from llama_index.core import StorageContext, load_index_from_storage
from llama_index.graph_stores.neo4j import Neo4jPropertyGraphStore

graph_store = Neo4jPropertyGraphStore(
    url="bolt://localhost:7687",
    username="neo4j",
    password="",
)

storage_context = StorageContext.from_defaults(
    persist_dir="./storage",
    property_graph_store=graph_store,
)

# WHY load_index_from_storage: reads metadata from disk, graph data from Neo4j — no re-extraction
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine(include_text=True)

Retriever Comparison

Retriever	Speed	LLM Calls	Best For
`LLMSynonymRetriever`	Fast	1 (synonyms only)	Named entity lookups
`VectorContextRetriever`	Medium	0 (embedding only)	Semantic/conceptual queries
`CypherTemplateRetriever`	Medium	1–2 (Cypher gen)	Multi-hop relational queries
Combined (all three)	Slower	2–3	Production; maximum recall

For production, combine all three as sub-retrievers. LlamaIndex deduplicates overlapping node results automatically.

# WHY combining all three: captures entity name matches, semantic neighbors, and graph paths
retriever = index.as_retriever(
    sub_retrievers=[
        LLMSynonymRetriever(index.property_graph_store, llm=Settings.llm, include_text=True),
        VectorContextRetriever(index.property_graph_store, embed_model=Settings.embed_model, include_text=True),
        CypherTemplateRetriever(index.property_graph_store),
    ]
)

Verification

Run this Cypher in Neo4j Browser to confirm your graph has populated correctly:

// Count all nodes and relationships
MATCH (n) RETURN labels(n) AS type, count(n) AS count
UNION
MATCH ()-[r]->() RETURN [type(r)] AS type, count(r) AS count
ORDER BY count DESC

You should see: Multiple node types (PERSON, ORGANIZATION, TECHNOLOGY, etc.) with relationship counts above 0. If all counts are 0, extraction failed silently — check that kg_extractors was passed to PropertyGraphIndex.from_documents.

Query the engine:

response = query_engine.query("List all technologies mentioned alongside neural networks.")
assert len(str(response)) > 50, "Empty response — check Neo4j connection and index reload"
print("✅ Graph RAG pipeline working")

What You Learned

PropertyGraphIndex extracts (entity, relation, entity) triples using an LLM and stores them in Neo4j as a traversable property graph
Defining tight possible_entities and possible_relations schemas is the highest-leverage tuning step — loose schemas produce duplicate, inconsistent nodes
LLMSynonymRetriever, VectorContextRetriever, and CypherTemplateRetriever each cover different query shapes; combine all three for production workloads
Graph RAG excels at multi-hop relational questions; it adds overhead for simple factual lookups that vector RAG handles well
Always persist with storage_context.persist() after first build — re-extraction on 100 documents costs ~$0.30 and 3 minutes each run

When NOT to use property graph RAG:

Documents are unstructured prose with no clear entities (product reviews, logs)
All queries are single-document lookups — vector RAG is cheaper and faster
Your team can't maintain a Neo4j instance — use SimplePropertyGraphStore (in-memory) for prototyping

Tested on LlamaIndex 0.10.68, Neo4j 5.18, Python 3.12, gpt-4o-mini, text-embedding-3-small · macOS Sequoia & Ubuntu 24.04

FAQ

Q: Can I use a local LLM instead of OpenAI for entity extraction? A: Yes. Set Settings.llm = Ollama(model="llama3.3:70b") and Settings.embed_model = OllamaEmbedding(model_name="nomic-embed-text"). Extraction is slower (~30s/chunk on an RTX 3090) but costs nothing after hardware.

Q: What is SimplePropertyGraphStore and when should I use it? A: It's LlamaIndex's in-memory graph store — no external DB required. Use it for prototyping and testing. It does not persist across runs; switch to Neo4jPropertyGraphStore for production.

Q: How many triplets per chunk is optimal? A: max_triplets_per_chunk=10 is the right starting point. Set it lower (5–7) for dense technical text where precision matters more than recall. Going above 15 introduces noise.

Q: Does CypherTemplateRetriever work with graph stores other than Neo4j? A: Cypher is Neo4j's query language. For other backends like Nebula Graph or Amazon Neptune, you need the matching retriever adapter. LLMSynonymRetriever and VectorContextRetriever work with any PropertyGraphStore implementation.

Q: How do I add new documents to an existing graph without rebuilding from scratch? A: Call index.insert(new_document) for individual documents or index.refresh_ref_docs(updated_docs) for a batch update. LlamaIndex only processes new or changed chunks, leaving existing graph nodes intact.