Core Concepts
Understanding the key concepts and architecture of GraphRAG.js.
Architecture Overview
GraphRAG.js is built on four core abstractions:
┌─────────────────────────────────────────────────┐
│ Graph (User API) │
│ insert() | query() | entities | relations │
└─────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌─────────┐
│ Provider│ │ Storage │ │ AI │
│ │ │ │ │ SDK │
└─────────┘ └──────────┘ └─────────┘1. Graph Class
The Graph class is your main interface. It provides methods for:
- Inserting documents to build the knowledge graph
- Querying with natural language questions
- Accessing entities and relationships
- Exporting the graph in various formats
2. Provider
The provider implements the graph RAG algorithm. It determines:
- How entities and relationships are extracted
- How the graph is structured
- How context is retrieved during queries
- Which query modes are available
Providers implement the GraphProvider interface and can be swapped without changing your code.
3. Storage
Storage backends persist your data. GraphRAG.js uses three storage types:
- GraphStore: Stores entities, relationships, and graph structure
- VectorStore: Stores embeddings for similarity search
- KVStore: Stores key-value metadata
All three are pluggable and can use different backends.
4. AI SDK Integration
GraphRAG.js uses the Vercel AI SDK for:
- LLM calls (answer generation, entity extraction)
- Embeddings (vector representations)
- Streaming (real-time response generation)
This gives you access to any LLM provider supported by the AI SDK.
The Insert Pipeline
When you insert documents, the following happens:
Input Text
│
▼
Chunking (optional)
│
▼
Provider.extendGraph(chunks)
│
├─> Entity Extraction (via LLM)
├─> Relationship Extraction (via LLM)
├─> Graph Construction
├─> Embedding Generation
└─> Storage (GraphStore + VectorStore + KVStore)Chunking
By default, documents are split into chunks of ~1200 tokens with 100-token overlap. You can customize this:
const graph = createGraph({
// ...
chunking: {
size: 800,
overlap: 100,
},
});Or provide your own chunking function:
const graph = createGraph({
// ...
chunking: {
fn: (text: string) => {
// Custom chunking logic
return chunks;
},
},
});Extraction
Different providers extract different structures:
- Similarity: No extraction, chunks become nodes
- LightRAG: Entities + relationships with dual-level vectors
- Microsoft: Entities + relationships + communities + reports
- Fast: Entities + relationships with PageRank
- AWS: Chunks → Statements → Facts → Entities (hierarchical)
The Query Pipeline
When you query the graph, the following happens:
Query Text
│
▼
Embed Query
│
▼
Provider.retrieveContext(query, params)
│
├─> Vector Search (initial retrieval)
├─> Graph Traversal (expansion)
├─> Context Assembly
└─> Return Context
│
▼
LLM Answer Generation (via AI SDK)
│
▼
Return ResponseQuery Modes
Different providers support different query modes:
LightRAG:
local: Entity-focused retrievalglobal: Relationship-focused retrievalhybrid: Combined approach (default)naive: Pure vector search
Microsoft GraphRAG:
local: Entity neighborhoods + community contextglobal: High-level community reportsnaive: Pure vector search
Fast GraphRAG:
pagerank: Personalized PageRank expansionnaive: Pure vector search
AWS GraphRAG:
traversal: Top-down + bottom-up graph traversalsemantic: Beam search through fact chains
Similarity Graph:
- Uses
maxDepthparameter (0 = vector only, >0 = graph expansion)
Entities and Relationships
Entities
Entities are the nodes in your knowledge graph. They represent:
- People
- Organizations
- Locations
- Concepts
- Events
- Custom types (algorithm-dependent)
Each entity has:
- ID: Unique identifier
- Name: Human-readable name
- Type: Entity type (e.g., "person", "organization")
- Description: Summary of what's known about the entity
- Metadata: Additional properties
Relationships
Relationships are the edges connecting entities. They capture:
- How entities relate to each other
- The nature of connections
- Directionality (source → target)
Each relationship has:
- Source: Origin entity ID
- Target: Destination entity ID
- Type: Relationship type (e.g., "works_at", "located_in")
- Description: Details about the relationship
- Strength/Weight: Connection importance (optional)
Storage Abstraction
GraphRAG.js uses three storage interfaces:
GraphStore
Stores graph structure:
interface GraphStore {
upsertNode(node: GNode): Promise<void>;
upsertEdge(edge: GEdge): Promise<void>;
getNode(id: string): Promise<GNode | null>;
getNeighbors(id: string, direction?: "in" | "out" | "both"): Promise<GNode[]>;
query(cypher: string): Promise<any>;
close(): Promise<void>;
}VectorStore
Stores embeddings for similarity search:
interface VectorStore {
upsert(vectors: VectorRecord[]): Promise<void>;
query(vector: number[], topK: number, filter?: any): Promise<VectorQueryResult[]>;
delete(ids: string[]): Promise<void>;
close(): Promise<void>;
}KVStore
Stores key-value metadata:
interface KVStore {
get(key: string): Promise<any>;
set(key: string, value: any): Promise<void>;
delete(key: string): Promise<void>;
keys(pattern?: string): Promise<string[]>;
close(): Promise<void>;
}Provider Interface
Custom providers implement GraphProvider:
interface GraphProvider<TQueryParams = any, TResult = any> {
type: string;
setupGraph(ctx: ProviderContext): Promise<void>;
extendGraph(chunks: GDocument[]): Promise<void>;
retrieveContext(query: string, params: TQueryParams): Promise<TResult>;
}Methods:
setupGraph: Initialize the provider (called once)extendGraph: Process new chunks and update the graphretrieveContext: Retrieve context for a query
Provider Context:
interface ProviderContext {
storage: {
graph: GraphStore;
vector: VectorStore;
kv: KVStore;
};
model: LanguageModel;
embedding: EmbeddingModel;
cheapModel?: LanguageModel;
namespace: string;
domain?: string;
exampleQueries?: string[];
}Multi-Tenancy with Namespaces
Use namespaces to isolate data in shared storage:
const graph1 = createGraph({
// ...
namespace: "project-a",
});
const graph2 = createGraph({
// ...
namespace: "project-b",
});Different namespaces can share the same storage backend without interference.
Context and Provenance
Query results include context and provenance:
const result = await graph.query("Your question");
// Generated answer
console.log(result.text);
// Source context used
console.log(result.context);
// Token usage
console.log(result.usage);
// Response metadata
console.log(result.metadata);Best Practices
1. Choose the Right Algorithm
- Prototyping: Start with
similarityGraph() - General purpose: Use
lightrag()(balanced cost/performance) - Deep analysis: Use
microsoftGraph()(expensive but thorough) - Fast/cheap: Use
fastGraph()(good for large datasets) - Multi-hop reasoning: Use
awsGraph()(complex queries)
2. Storage Selection
- Development: Use
memoryStorage()(fast, no setup) - Production: Use external databases (Neo4j, Qdrant, PostgreSQL)
- Hybrid: Mix and match (e.g., Neo4j + Qdrant + Redis)
3. Model Selection
- LLM: Use faster models (gpt-4o-mini) for cost efficiency
- Embeddings: Use smaller models (text-embedding-3-small) when possible
- Cheap model: Provide a cheaper model for summarization tasks
const graph = createGraph({
model: openai("gpt-4o-mini"), // main model
cheapModel: openai("gpt-4o-mini"), // for summarization
embedding: openai.embedding("text-embedding-3-small"),
// ...
});4. Chunking Strategy
- Small chunks (500-800 tokens): Better precision, more graph nodes
- Large chunks (1200-1500 tokens): Better context, fewer nodes
- Overlap: 10-15% of chunk size for continuity
5. Query Optimization
- Use appropriate query modes for your question type
- Use
contextOnly: trueto inspect retrieved context - Adjust
topKandmaxDepthbased on your needs
Next Steps
- Algorithms Overview - Choose an algorithm
- API Reference - Detailed API documentation
- Storage Options - Configure storage backends