Neo4j Graph Storage
The @graphrag-js/neo4j package provides production-ready graph storage using Neo4j with Graph Data Science (GDS) support for advanced graph algorithms.
Installation
pnpm add @graphrag-js/neo4jFeatures
- ✅ Leiden Community Detection - Built-in GDS integration
- ✅ Cypher Query Language - Powerful graph queries
- ✅ ACID Transactions - Data consistency guarantees
- ✅ Horizontal Scaling - Cluster support (Enterprise)
- ✅ Advanced Algorithms - PageRank, shortest paths, centrality
- ✅ Label-based Isolation - Multi-tenant support
Prerequisites
Neo4j Database
You need Neo4j with the Graph Data Science (GDS) plugin:
Option 1: Docker (Recommended)
docker run -d \
--name neo4j \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
-e NEO4J_PLUGINS='["graph-data-science"]' \
neo4j:5-enterpriseOption 2: Neo4j Desktop
- Download Neo4j Desktop
- Create a new database
- Install the GDS plugin from the plugins tab
Option 3: Neo4j AuraDB
- Sign up at Neo4j Aura
- GDS is available on Professional and Enterprise tiers
Verify GDS Installation
CALL gds.version()Should return the GDS version (e.g., 2.6.0).
Quick Start
import { createGraph } from '@graphrag-js/core';
import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: microsoftGraph(), // Neo4j works best with community-based algorithms
storage: {
graph: neo4jGraph({
url: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
database: 'neo4j', // optional
maxGraphClusterSize: 10,
graphClusterSeed: 42,
}),
}
});
await graph.insert('Your documents...');
const result = await graph.query('Your question?');Configuration
neo4jGraph(config)
interface Neo4jGraphConfig {
url: string; // Neo4j connection URL
username: string; // Database username
password: string; // Database password
database?: string; // Database name (default: 'neo4j')
workingDir?: string; // Namespace prefix (default: 'default')
maxGraphClusterSize?: number; // Leiden max levels (default: 10)
graphClusterSeed?: number; // Clustering random seed (default: 42)
}Connection URL Formats
// Local instance
url: 'bolt://localhost:7687'
// Aura cloud instance
url: 'neo4j+s://xxxxx.databases.neo4j.io'
// Custom port
url: 'bolt://my-server.com:7687'Usage Examples
Basic Graph Operations
import { neo4jGraph } from '@graphrag-js/neo4j';
const graphStore = neo4jGraph({
url: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
})('my-namespace');
// Add nodes
await graphStore.upsertNode('entity-1', {
entity_type: 'person',
description: 'John Doe, software engineer',
source_id: 'doc-1',
});
// Add edges
await graphStore.upsertEdge('entity-1', 'entity-2', {
relationship: 'works_with',
weight: 0.8,
description: 'Collaborates on projects',
});
// Run Leiden clustering
await graphStore.clustering('leiden');
// Get community structure
const communities = await graphStore.communitySchema();With Microsoft GraphRAG
Neo4j is ideal for Microsoft GraphRAG's community detection:
import { microsoftGraph } from '@graphrag-js/microsoft';
import { neo4jGraph } from '@graphrag-js/neo4j';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: microsoftGraph({
entityTypes: ['person', 'organization', 'location', 'event'],
graphClusterAlgorithm: 'leiden',
maxGraphClusterSize: 10,
}),
storage: {
graph: neo4jGraph({
url: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
maxGraphClusterSize: 10,
}),
}
});Multi-Database Setup
// Use different databases for different namespaces
const tenant1 = neo4jGraph({
url: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
database: 'tenant1',
})('namespace-1');
const tenant2 = neo4jGraph({
url: 'bolt://localhost:7687',
username: 'neo4j',
password: 'password',
database: 'tenant2',
})('namespace-2');Community Detection
Leiden Algorithm
Neo4j GDS provides hierarchical Leiden clustering:
await graphStore.clustering('leiden');
const communities = await graphStore.communitySchema();
// Structure:
{
"community-123": {
level: 0, // Hierarchy level
title: "Cluster 123",
nodes: ["entity-1", "entity-2", ...],
edges: [["entity-1", "entity-2"], ...],
chunk_ids: ["doc-1", "doc-2", ...],
occurrence: 0.85, // Importance score
sub_communities: ["community-456", ...],
}
}GDS Configuration
The clustering uses these GDS parameters:
CALL gds.leiden.write(
'graph_namespace',
{
writeProperty: 'communityIds',
includeIntermediateCommunities: true,
relationshipWeightProperty: 'weight',
maxLevels: 10, // maxGraphClusterSize
tolerance: 0.0001,
gamma: 1.0,
theta: 0.01,
randomSeed: 42 // graphClusterSeed
}
)Advanced Features
Knowledge Graph Extraction
const kg = await graphStore.getKnowledgeGraph(
'person', // Node label to start from
2, // Max depth
1, // Min degree (connectivity)
true // Include nodes at exactly minDegree
);
// Returns:
{
nodes: [
{ id: "entity-1", labels: ["person"], properties: {...} },
...
],
edges: [
{ id: "edge-1", type: "RELATED", source: "entity-1", target: "entity-2", properties: {...} },
...
]
}Direct Cypher Queries
For advanced use cases, access the driver directly:
import neo4j from 'neo4j-driver';
const driver = neo4j.driver(
'bolt://localhost:7687',
neo4j.auth.basic('neo4j', 'password')
);
const session = driver.session();
const result = await session.run(`
MATCH (n:person)-[r:works_with]->(m:person)
WHERE n.id = $nodeId
RETURN m.id AS colleague
`, { nodeId: 'entity-1' });
await session.close();
await driver.close();Performance Optimization
Indexes
Neo4j automatically creates indexes on node IDs. For better performance:
-- Index on entity descriptions (full-text search)
CREATE FULLTEXT INDEX entity_descriptions
FOR (n:person|organization|location)
ON EACH [n.description]
-- Composite indexes for frequent queries
CREATE INDEX entity_type_source
FOR (n:Node)
ON (n.entity_type, n.source_id)Query Optimization
-- Use EXPLAIN to analyze query plans
EXPLAIN MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN m
-- Use PROFILE for execution stats
PROFILE MATCH (n:person)-[:works_with]->(m)
WHERE n.id = 'entity-1'
RETURN mBatch Operations
For bulk inserts, use transactions:
const session = driver.session();
const tx = session.beginTransaction();
try {
for (const node of nodes) {
await tx.run(
'MERGE (n:Node {id: $id}) SET n += $props',
{ id: node.id, props: node.properties }
);
}
await tx.commit();
} catch (error) {
await tx.rollback();
throw error;
} finally {
await session.close();
}Monitoring & Debugging
Neo4j Browser
Access at http://localhost:7474
Useful queries:
-- Count nodes by label
MATCH (n)
RETURN labels(n) AS label, count(*) AS count
ORDER BY count DESC
-- View graph structure
MATCH (n)-[r]->(m)
RETURN n, r, m
LIMIT 100
-- Check community distribution
MATCH (n)
WHERE n.communityIds IS NOT NULL
RETURN n.communityIds[0] AS community, count(*) AS size
ORDER BY size DESCGDS Monitoring
-- List projected graphs
CALL gds.graph.list()
-- View graph stats
CALL gds.graph.list('graph_namespace')
YIELD nodeCount, relationshipCount
-- Drop stale graphs
CALL gds.graph.drop('graph_namespace')Production Deployment
Docker Compose
version: '3.8'
services:
neo4j:
image: neo4j:5-enterprise
ports:
- "7474:7474" # HTTP
- "7687:7687" # Bolt
environment:
NEO4J_AUTH: neo4j/your-secure-password
NEO4J_PLUGINS: '["graph-data-science"]'
NEO4J_dbms_memory_heap_initial__size: 2G
NEO4J_dbms_memory_heap_max__size: 4G
NEO4J_dbms_memory_pagecache_size: 2G
volumes:
- neo4j-data:/data
- neo4j-logs:/logs
volumes:
neo4j-data:
neo4j-logs:Memory Configuration
For production workloads:
# Heap size (for query execution)
NEO4J_dbms_memory_heap_initial_size=4G
NEO4J_dbms_memory_heap_max_size=8G
# Page cache (for graph data)
NEO4J_dbms_memory_pagecache_size=4GBackup Strategy
# Backup database
neo4j-admin database dump neo4j --to-path=/backups
# Restore database
neo4j-admin database load neo4j --from-path=/backups/neo4j.dumpTroubleshooting
GDS Plugin Not Found
Error: There is no procedure with the name gds.leiden.write
Solution:
- Verify GDS is installed:
CALL gds.version() - Restart Neo4j after installing plugins
- Check Neo4j logs for plugin loading errors
Connection Refused
Error: ServiceUnavailable: Connection refused
Solution:
- Verify Neo4j is running:
docker ps - Check port mapping:
7687for Bolt - Test with Neo4j Browser at
http://localhost:7474
Out of Memory
Error: OutOfMemoryError: Java heap space
Solution:
- Increase heap size in neo4j.conf or Docker env
- Use pagination for large result sets
- Add LIMIT clauses to queries
Cost Considerations
| Deployment | Cost | Best For |
|---|---|---|
| Self-hosted (Community) | Free + hosting | Development, small production |
| Self-hosted (Enterprise) | License fee | Large-scale production |
| Neo4j Aura Free | Free (limited) | Development, testing |
| Neo4j Aura Professional | ~$65/month+ | Production |
| Neo4j Aura Enterprise | Custom | Enterprise scale |
Next Steps
- Qdrant Storage - For vector search
- Redis Storage - For KV storage
- Neo4j GDS Documentation
- Microsoft GraphRAG Algorithm