Skip to content

Algorithm Overview โ€‹

GraphRAG.js provides multiple graph RAG algorithms, each with different strategies for building and querying knowledge graphs. All algorithms share the same API but use different approaches under the hood.

The Core Concept โ€‹

All GraphRAG algorithms follow this pattern:

Documents โ†’ Graph Construction โ†’ Query Processing โ†’ Answer Generation

But they differ in:

  • What nodes/edges represent (chunks, entities, facts, statements)
  • How the graph is built (extraction, similarity, clustering)
  • How queries are answered (vector search, traversal, PageRank, communities)

Available Algorithms โ€‹

AlgorithmStatusBest ForComplexityCost
Similarity Graphโœ… AvailableQuick prototyping, baselinesLowLow
LightRAG๐Ÿšง Coming SoonGeneral purpose, balancedMediumMedium
Microsoft GraphRAGโœ… AvailableDeep thematic analysisHighHigh
Fast GraphRAG๐Ÿšง Coming SoonSpeed, cost efficiencyMediumLow
AWS GraphRAG๐Ÿšง Coming SoonMulti-hop reasoningHighMedium-High

Quick Comparison โ€‹

Similarity Graph โœ… โ€‹

Available Now

The simplest baseline: chunks as nodes, similarity as edges.

typescript
import { similarityGraph } from '@graphrag-js/similarity';

const graph = createGraph({
  provider: similarityGraph({
    similarityThreshold: 0.7,
  }),
});

How it works:

  1. Chunk documents
  2. Create edges between similar chunks (cosine > threshold)
  3. Query: Vector search + BFS expansion

Pros:

  • โœ… Simple to understand
  • โœ… Fast setup
  • โœ… Low cost
  • โœ… Good baseline

Cons:

  • โŒ No entity extraction
  • โŒ Limited relationship understanding
  • โŒ No global reasoning

Learn more โ†’


LightRAG ๐Ÿšง โ€‹

Coming Soon

Dual-level retrieval with entities and relationships embedded separately.

typescript
import { lightrag } from '@graphrag-js/lightrag';

const graph = createGraph({
  provider: lightrag({
    entityTypes: ['person', 'organization', 'location'],
    maxGleanings: 1,
  }),
});

How it works:

  1. Extract entities and relationships via LLM
  2. Create two separate vector indexes (entities + relations)
  3. Query modes:
    • Local: Search entity vectors
    • Global: Search relationship vectors
    • Hybrid: Combine both

Pros:

  • โœ… Balanced cost/performance
  • โœ… Good for general use cases
  • โœ… Fast incremental updates
  • โœ… Multiple query modes

Cons:

  • โŒ No community detection
  • โŒ No hierarchical summaries

Status: ๐Ÿšง Implementation in progress


Microsoft GraphRAG โœ… โ€‹

Available Now

Hierarchical community detection with summarized reports. Ported from nano-graphrag.

typescript
import { microsoftGraph } from '@graphrag-js/microsoft';

const graph = createGraph({
  provider: microsoftGraph({
    entityTypes: ['organization', 'person', 'geo', 'event'],
    entityExtractMaxGleaning: 1,
    maxGraphClusterSize: 10,
  }),
});

How it works:

  1. Extract entities and relationships via LLM (with gleaning)
  2. Run Leiden clustering to detect communities
  3. Generate hierarchical community reports via LLM
  4. Query modes:
    • Local: Entity neighborhoods + community context
    • Global: Map-reduce over community reports
    • Naive: Pure vector search baseline

Pros:

  • โœ… Best for thematic analysis
  • โœ… Global reasoning capabilities
  • โœ… Hierarchical understanding
  • โœ… Well-researched (Microsoft)

Cons:

  • โŒ Expensive (many LLM calls for reports)
  • โŒ Slow indexing
  • โŒ Complex setup

Status: โœ… Complete โ€” 29 tests passing


Fast GraphRAG ๐Ÿšง โ€‹

Coming Soon

PageRank-based retrieval without expensive community detection.

typescript
import { fastGraph } from '@graphrag-js/fast';

const graph = createGraph({
  provider: fastGraph({
    pagerank: {
      damping: 0.85,
      maxIterations: 100,
    },
  }),
});

How it works:

  1. Extract entities and relationships via LLM
  2. No community detection (saves cost!)
  3. Query: Personalized PageRank from seed entities
  4. Token-budget truncation for context

Pros:

  • โœ… Fast and cheap
  • โœ… No community overhead
  • โœ… Good incremental updates
  • โœ… PageRank naturally surfaces importance

Cons:

  • โŒ No global summaries
  • โŒ Relies on good entity extraction
  • โŒ May miss disconnected clusters

Status: ๐Ÿšง Planned for Phase 6


AWS GraphRAG ๐Ÿšง โ€‹

Coming Soon

Fact-centric hierarchical graph: chunks โ†’ statements โ†’ facts โ†’ entities.

typescript
import { awsGraph } from '@graphrag-js/aws';

const graph = createGraph({
  provider: awsGraph({
    semantic: {
      beamWidth: 5,
      maxPaths: 10,
    },
  }),
});

How it works:

  1. Extract statements (propositions) from chunks
  2. Extract facts (subject/relation/object triples) from statements
  3. Extract entities from facts
  4. Build hierarchical graph
  5. Query modes:
    • Traversal: Top-down (chunk vectors) + bottom-up (entity keywords)
    • Semantic: Beam search through fact chains

Pros:

  • โœ… Best for multi-hop reasoning
  • โœ… Explicit fact representation
  • โœ… Cross-document connections
  • โœ… Statement-level granularity

Cons:

  • โŒ Complex extraction pipeline
  • โŒ Many LLM calls
  • โŒ Higher latency

Status: ๐Ÿšง Planned for Phase 7


Choosing an Algorithm โ€‹

By Use Case โ€‹

Prototyping / Baseline โ†’ Use Similarity Graph โœ…

General Purpose RAG โ†’ Use LightRAG ๐Ÿšง (when available)

Thematic Analysis / Research โ†’ Use Microsoft GraphRAG โœ…

Fast / Cost-Effective โ†’ Use Fast GraphRAG ๐Ÿšง (when available)

Multi-Hop Reasoning โ†’ Use AWS GraphRAG ๐Ÿšง (when available)

By Dataset Size โ€‹

< 10K documents โ†’ Any algorithm works

10K - 100K documents โ†’ Similarity Graph โœ… or Fast GraphRAG ๐Ÿšง

100K - 1M documents โ†’ Fast GraphRAG ๐Ÿšง or LightRAG ๐Ÿšง

> 1M documents โ†’ Fast GraphRAG ๐Ÿšง with distributed storage

By Query Type โ€‹

Factoid questions ("What is X?") โ†’ Similarity Graph โœ… or LightRAG ๐Ÿšง

Relationship queries ("How are X and Y related?") โ†’ LightRAG ๐Ÿšง or Fast GraphRAG ๐Ÿšง

Thematic questions ("What are the main themes?") โ†’ Microsoft GraphRAG โœ…

Multi-hop questions ("If X, then Y, then what?") โ†’ AWS GraphRAG ๐Ÿšง

By Budget โ€‹

Low cost โ†’ Similarity Graph โœ… (no LLM extraction) or Fast GraphRAG ๐Ÿšง

Medium cost โ†’ LightRAG ๐Ÿšง or AWS GraphRAG ๐Ÿšง

High cost โ†’ Microsoft GraphRAG โœ… (many LLM calls for community reports)

Implementation Roadmap โ€‹

PhaseAlgorithmStatusETA
3Similarity Graphโœ… CompleteAvailable Now
4Microsoft GraphRAGโœ… CompleteAvailable Now
5LightRAG (default)โฌœ PlannedTBD
6Fast GraphRAGโฌœ PlannedTBD
7AWS GraphRAGโฌœ PlannedTBD

See ROADMAP.md for detailed implementation status.

Algorithm Details โ€‹

Graph Structure Comparison โ€‹

AlgorithmNodesEdgesIndexes
SimilarityChunksSimilarityChunk vectors
LightRAGEntities, ChunksRelations, ContainsEntity vectors, Relation vectors, Chunk vectors
MicrosoftEntities, CommunitiesRelations, MemberOfEntity vectors, Chunk vectors
FastEntitiesRelationsEntity vectors (HNSW)
AWSChunks, Statements, Facts, EntitiesContains, Extracts, ReferencesChunk vectors, Statement vectors

Query Processing Comparison โ€‹

AlgorithmQuery Processing
SimilarityEmbed query โ†’ Vector search โ†’ BFS expansion
LightRAGEmbed query โ†’ Dual vector search (entities + relations) โ†’ Expand โ†’ LLM
MicrosoftEmbed query โ†’ Extract entities โ†’ Search + Community reports โ†’ LLM
FastEmbed query โ†’ Entity search โ†’ Personalized PageRank โ†’ LLM
AWSEmbed query โ†’ Traversal (top-down + bottom-up) or Beam search โ†’ LLM

Performance Characteristics โ€‹

Indexing Speed โ€‹

AlgorithmSpeedCost
Similarityโšกโšกโšก Fastest$ Cheapest
Fastโšกโšก Fast$$ Low
LightRAGโšก Medium$$$ Medium
AWS๐ŸŒ Slow$$$$ High
Microsoft๐ŸŒ๐ŸŒ Slowest$$$$$ Highest

Query Speed โ€‹

AlgorithmSpeedQuality
Similarityโšกโšกโšก Fastestโญโญ Basic
Fastโšกโšก Fastโญโญโญ Good
LightRAGโšก Mediumโญโญโญโญ Very Good
AWS๐ŸŒ Slowโญโญโญโญ Very Good
Microsoft๐ŸŒ Slowโญโญโญโญโญ Excellent

Next Steps โ€‹

External Resources โ€‹

Released under the Elastic License 2.0.