LightRAG
Status: 🚧 Coming Soon (Planned for Phase 5)
LightRAG is a dual-level retrieval algorithm that embeds entities and relationships separately for enhanced context retrieval.
Overview
LightRAG will be the default provider for GraphRAG.js, offering a balanced approach between simplicity and performance.
Key Innovation: Instead of searching a single vector space, LightRAG creates two separate vector indexes:
- Entity vectors - For entity-focused queries
- Relationship vectors - For relationship-focused queries
This dual-level approach enables:
- Local search - Find specific entities and their neighborhoods
- Global search - Discover relationships and connections
- Hybrid search - Combine both for comprehensive retrieval
Planned API
typescript
import { createGraph } from '@graphrag-js/core';
import { lightrag } from '@graphrag-js/lightrag';
import { openai } from '@ai-sdk/openai';
const graph = createGraph({
model: openai('gpt-4o-mini'),
embedding: openai.embedding('text-embedding-3-small'),
provider: lightrag({
entityTypes: ['person', 'organization', 'location', 'concept', 'event'],
maxGleanings: 1,
summarizeThreshold: 8,
topK: 60,
}),
});
await graph.insert('Your documents...');
// Local search (entity-focused)
const local = await graph.query('Who is Sarah Chen?', { mode: 'local' });
// Global search (relationship-focused)
const global = await graph.query('What partnerships exist?', { mode: 'global' });
// Hybrid search (combines both)
const hybrid = await graph.query('Summarize TechCorp', { mode: 'hybrid' });
// Naive search (baseline vector search)
const naive = await graph.query('Tell me about AI', { mode: 'naive' });Configuration
typescript
interface LightRAGConfig {
entityTypes?: string[]; // default: ["person", "organization", "location", "event", "concept"]
maxGleanings?: number; // default: 1 (extra extraction passes)
summarizeThreshold?: number; // default: 8 (merge descriptions after N duplicates)
summaryMaxTokens?: number; // default: 500
concurrency?: number; // max parallel LLM calls, default: 8
topK?: number; // default: 60
maxEntityTokens?: number; // default: 6000
maxRelationTokens?: number; // default: 8000
maxTotalTokens?: number; // default: 30000
}How It Will Work
1. Indexing
Documents
↓
Chunk into pieces
↓
LLM extraction (with gleaning)
↓
Extract entities and relationships
↓
Merge duplicate entities
↓
Summarize if > threshold
↓
Embed entities → Entity vector index
↓
Embed relationships → Relation vector index
↓
Store graph structure2. Query Modes
Local Search:
Query → Embed → Search entity vectors → Fetch entity descriptions + neighbor chunks → LLM answerGlobal Search:
Query → Embed → Search relation vectors → Fetch relation descriptions + chunks → LLM answerHybrid Search:
Query → Run local + global in parallel → Merge contexts → LLM answerNaive Search:
Query → Embed → Search chunk vectors → LLM answer (baseline)Graph Structure
Nodes
- Entities:
person,organization,location,event,concept - Chunks: Original document pieces
Edges
- Relationships: Typed relations between entities (e.g.,
works_for,located_in) - Contains: Chunks contain entities
Indexes
- Entity vector index (1536-dim)
- Relation vector index (1536-dim)
- Chunk vector index (1536-dim)
When to Use
✅ Best For
- General-purpose RAG applications
- Balanced cost/performance requirements
- Entity-centric queries
- Relationship discovery
- Multi-domain documents
❌ Not Ideal For
- Thematic analysis (use Microsoft GraphRAG)
- Very large graphs (use Fast GraphRAG)
- Complex multi-hop reasoning (use AWS GraphRAG)
Comparison
| Feature | LightRAG | Similarity | Microsoft | Fast |
|---|---|---|---|---|
| Entity extraction | ✅ Yes | ❌ No | ✅ Yes | ✅ Yes |
| Dual vectors | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Community detection | ❌ No | ❌ No | ✅ Yes | ❌ No |
| Setup complexity | ⭐⭐ Low | ⭐ Lowest | ⭐⭐⭐⭐ High | ⭐⭐⭐ Medium |
| Indexing cost | $$$ Medium | $ Cheapest | $$$$$ Highest | $$ Low |
| Query quality | ⭐⭐⭐⭐ Very Good | ⭐⭐ Basic | ⭐⭐⭐⭐⭐ Excellent | ⭐⭐⭐ Good |
Implementation Status
Current Phase: Phase 5 (Not started)
Prerequisites:
- ✅ Phase 1: Core refactor complete
- ✅ Phase 2: Memory storage complete
- ✅ Phase 3: Similarity graph complete
- 🚧 Phase 4: Microsoft GraphRAG in progress
Next Steps:
- Port LLM extraction logic from Python implementation
- Implement gleaning (iterative extraction)
- Implement entity merging and summarization
- Create dual vector indexes
- Implement query modes (local/global/hybrid/naive)
- Write comprehensive tests
- Add examples and documentation
Estimated Effort: 3-4 weeks
Reference Implementation
LightRAG is based on HKUDS/LightRAG:
- Paper: LightRAG: Simple and Fast Retrieval-Augmented Generation
- Authors: HKUDS (Hong Kong University Data Science Lab)
- Language: Python
- License: MIT
Stay Updated
- Watch the GitHub repository for updates
- Check ROADMAP.md for implementation progress
- Join discussions in GitHub Issues
Related
- Algorithm Overview - Compare all algorithms
- Similarity Graph - Available now as baseline
- Microsoft GraphRAG - Community-based approach
- Fast GraphRAG - PageRank-based approach