AWS GraphRAG

AWS GraphRAG builds a hierarchical fact-centric lexical graph, enabling multi-hop reasoning and cross-document connections.

Overview

AWS GraphRAG (from awslabs/graphrag-toolkit) is designed for:

Multi-hop reasoning across connected facts
Cross-document connections through shared entities
Fact-level precision with structured knowledge extraction
Hierarchical retrieval from chunks down to facts

Key idea: Build a hierarchical graph where chunks → statements → facts → entities, enabling precise traversal and beam search.

How It Works

Graph Structure

AWS GraphRAG uses a unique hierarchical structure:

Chunks (source text)
    ↓
Statements (atomic propositions)
    ↓
Facts (subject-relation-object triples)
    ↓
Entities (nodes)

Example:

Chunk:
  "OpenAI released GPT-4 in March 2023. The model was trained on Azure infrastructure."

Statements:
  1. "OpenAI released GPT-4 in March 2023"
  2. "GPT-4 was trained on Azure infrastructure"

Facts:
  1. (OpenAI, released, GPT-4)
  2. (GPT-4, released_in, March 2023)
  3. (GPT-4, trained_on, Azure infrastructure)

Entities:
  - OpenAI (organization)
  - GPT-4 (product)
  - Azure (platform)

1. Insert Pipeline

Documents
    ↓
Chunking
    ↓
Statement Extraction (LLM)
    ↓
Fact Extraction (LLM)
    ├─→ Subject-Relation-Object triples
    └─→ Extract entities from facts
    ↓
Build Hierarchical Graph
    ├─→ Chunk → Statement → Fact → Entity
    └─→ Entity ← Fact ← Statement ← Chunk
    ↓
Embed Chunks & Statements
    ↓
Storage

Key features:

Statements are first-class citizens: Atomic propositions that can stand alone
Facts provide structure: S-R-O triples enable precise traversal
Bi-directional links: Can traverse top-down (chunks) or bottom-up (entities)
Cross-document links: Entities connect facts across documents

2. Query Pipelines

AWS GraphRAG provides two retrieval strategies:

Traversal Mode

Top-down + bottom-up hybrid retrieval:

Query
    ↓
├─→ Top-Down (Chunk Vectors)      ├─→ Bottom-Up (Entity Keywords)
│   Embed query                    │   Extract keywords
│   Search chunk vectors          │   Search entity names
│   Get top-K chunks              │   Get matching entities
│   → Statements → Facts          │   → Facts → Statements
    ↓                                  ↓
    Combined Statement Set
    ↓
TF-IDF Reranking
    ↓
LLM Answer Generation

Best for: Balanced precision and recall

Semantic Mode

Beam search through fact chains:

Query
    ↓
Embed Query
    ↓
Search Statement Vectors
    ↓
Beam Search (width=5)
    ├─→ Expand through Facts
    ├─→ Follow Entity connections
    └─→ Score relevance at each step
    ↓
Diversity Filtering
    ↓
Path Analysis
    ↓
LLM Answer Generation

Best for: Complex multi-hop queries, reasoning chains

Installation

bash

pnpm add @graphrag-js/aws

Basic Usage

typescript

import { createGraph } from "@graphrag-js/core";
import { awsGraph } from "@graphrag-js/aws";
import { memoryStorage } from "@graphrag-js/memory";
import { openai } from "@ai-sdk/openai";

const graph = createGraph({
  model: openai("gpt-4o-mini"),
  embedding: openai.embedding("text-embedding-3-small"),
  provider: awsGraph({
    entityTypes: ["person", "organization", "location", "product", "concept"],
  }),
  storage: memoryStorage(),
});

await graph.insert(documents);

// Traversal mode (default)
const { text } = await graph.query("What is TechCorp?", {
  mode: "traversal",
});

// Semantic mode (beam search)
const { text } = await graph.query("Compare the AI initiatives", {
  mode: "semantic",
});

Configuration

typescript

interface AWSGraphConfig {
  entityTypes?: string[];
  maxGleanings?: number;
  concurrency?: number;
  traversal?: TraversalConfig;
  semantic?: SemanticConfig;
}

Entity Types

Specify entity types to extract from facts:

typescript

awsGraph({
  entityTypes: ["person", "organization", "location", "product", "technology"],
})

Default: Auto-detect based on content

Tips:

Use lowercase for entity types
Domain-specific types improve extraction
Keep to 5-7 types for best results

Traversal Configuration

typescript

interface TraversalConfig {
  maxSearchResults?: number;     // default: 10
  reranker?: "tfidf" | "model";  // default: "tfidf"
}

Example:

typescript

awsGraph({
  traversal: {
    maxSearchResults: 20,  // More results for better recall
    reranker: "tfidf",     // Fast TF-IDF reranking
  },
})

Reranker options:

"tfidf": Fast, lexical similarity (default)
"model": LLM-based reranking (slower, higher quality)

Semantic Configuration

typescript

interface SemanticConfig {
  beamWidth?: number;            // default: 5
  maxPaths?: number;             // default: 10
  diversityWeight?: number;      // default: 0.3
}

Example:

typescript

awsGraph({
  semantic: {
    beamWidth: 5,          // Number of parallel paths to explore
    maxPaths: 10,          // Maximum paths to return
    diversityWeight: 0.3,  // Balance relevance vs. diversity
  },
})

Tuning tips:

Higher beamWidth (7-10): Better recall, slower
Lower beamWidth (3-5): Faster, more focused
Higher diversityWeight (0.5): More varied results
Lower diversityWeight (0.1): More focused results

Query Modes

Traversal Mode (Default)

Combines top-down vector search and bottom-up entity search:

typescript

const { text } = await graph.query("What is OpenAI?", {
  mode: "traversal",
});

How it works:

Top-down: Vector search on chunks → get statements → get facts
Bottom-up: Keyword search on entities → get facts → get statements
Merge: Combine statement sets
Rerank: TF-IDF or model-based reranking
Generate: LLM answer from top statements

Best for:

Entity-focused queries ("What is X?")
Specific factual questions
Balanced precision and recall

Semantic Mode

Beam search through fact chains:

typescript

const { text } = await graph.query("How do the partnerships connect?", {
  mode: "semantic",
});

How it works:

Vector search on statements
Beam search through fact graph
Score paths by relevance + diversity
Return top-K paths
LLM answer synthesis

Best for:

Multi-hop questions ("How does A relate to C through B?")
Reasoning chains ("Why did X lead to Y?")
Complex relationships

Query Options

typescript

interface AWSQueryOptions {
  mode?: "traversal" | "semantic";
  withReferences?: boolean;
  stream?: boolean;
  contextOnly?: boolean;
}

Examples:

typescript

// Traversal mode with references
const { text, context } = await graph.query("Question", {
  mode: "traversal",
  withReferences: true,
});

// Semantic mode streaming
const { textStream } = await graph.query("Complex question", {
  mode: "semantic",
  stream: true,
});

for await (const chunk of textStream) {
  process.stdout.write(chunk);
}

// Context only (no LLM)
const { context } = await graph.query("Question", {
  mode: "semantic",
  contextOnly: true,
});

Hierarchical Graph Structure

The hierarchical structure enables flexible retrieval:

Top-Down Retrieval

Start from chunks, traverse down to facts:

Chunk (vector search)
    ↓
Statements (contained in chunk)
    ↓
Facts (extracted from statements)
    ↓
Entities (nodes in facts)

Use when: You want context around specific chunks

Bottom-Up Retrieval

Start from entities, traverse up to chunks:

Entity (keyword search)
    ↓
Facts (mention entity)
    ↓
Statements (contain facts)
    ↓
Chunks (source of statements)

Use when: You want all mentions of an entity

Fact-Level Traversal

Navigate through fact connections:

Fact 1: (A, relates_to, B)
    ↓
Entity B
    ↓
Fact 2: (B, connects_to, C)
    ↓
Entity C

Use when: Multi-hop reasoning across documents

Advantages

1. Multi-Hop Reasoning

Facts create precise connections between entities:

typescript

// Question: "How is OpenAI connected to Microsoft?"

// Graph traversal:
Fact 1: (OpenAI, partnered_with, Microsoft)
Fact 2: (Microsoft, invested_in, OpenAI)
Fact 3: (GPT-4, developed_by, OpenAI)
Fact 4: (GPT-4, integrated_in, Azure)
Fact 5: (Azure, owned_by, Microsoft)

// Answer: Multiple connection paths found

2. Cross-Document Connections

Entities link facts from different documents:

Document A: "OpenAI released GPT-4"
Document B: "Azure hosts GPT-4 models"
Document C: "Microsoft provides Azure infrastructure"

Connected through:
  GPT-4 entity → links A & B
  Azure entity → links B & C

3. Fact-Level Precision

Statements and facts provide atomic units of knowledge:

Instead of retrieving:
  "Large chunk containing many facts"

Retrieve precisely:
  Fact 1: (OpenAI, released, GPT-4)
  Fact 2: (GPT-4, trained_on, Azure)

4. Flexible Retrieval

Multiple retrieval strategies for different query types:

Traversal: Fast, balanced
Semantic: Deep, multi-hop

When to Use AWS GraphRAG

Use AWS GraphRAG when:

✅ You need multi-hop reasoning
✅ Your queries require connecting information across documents
✅ Fact-level precision is important
✅ You have complex relationship questions
✅ Cross-document entity resolution is needed

Don't use AWS GraphRAG when:

❌ Your queries are simple similarity searches (use Similarity Graph)
❌ You need global thematic summaries (use Microsoft GraphRAG)
❌ Cost is a major concern (more LLM calls than Fast GraphRAG)
❌ Your documents don't have rich factual content

Complete Example

typescript

import { createGraph } from "@graphrag-js/core";
import { awsGraph } from "@graphrag-js/aws";
import { neo4jGraph } from "@graphrag-js/neo4j";
import { qdrantVector } from "@graphrag-js/qdrant";
import { redisKV } from "@graphrag-js/redis";
import { openai } from "@ai-sdk/openai";

const graph = createGraph({
  model: openai("gpt-4o-mini"),
  embedding: openai.embedding("text-embedding-3-small"),

  provider: awsGraph({
    entityTypes: [
      "person",
      "organization",
      "location",
      "product",
      "technology",
      "event",
    ],
    maxGleanings: 1,
    concurrency: 4,
    traversal: {
      maxSearchResults: 15,
      reranker: "tfidf",
    },
    semantic: {
      beamWidth: 5,
      maxPaths: 10,
      diversityWeight: 0.3,
    },
  }),

  storage: {
    graph: neo4jGraph({ url: process.env.NEO4J_URL, ... }),
    vector: qdrantVector({ url: process.env.QDRANT_URL, ... }),
    kv: redisKV({ host: process.env.REDIS_HOST, ... }),
  },

  domain: "Technology company news and partnerships",
  exampleQueries: [
    "How are companies connected through partnerships?",
    "What products use which technologies?",
    "Who are the key people in AI development?",
  ],
});

// Insert documents
await graph.insert(documents);

// Traversal query
const result1 = await graph.query(
  "What is the relationship between OpenAI and Microsoft?"
);
console.log(result1.text);

// Semantic query (multi-hop)
const result2 = await graph.query(
  "How does GPT-4 connect to Azure through different partnerships?",
  { mode: "semantic" }
);
console.log(result2.text);

Comparison with Other Algorithms

Feature	AWS	LightRAG	Microsoft	Fast	Similarity
Statement extraction	✅	❌	❌	❌	❌
Fact triples	✅	❌	❌	❌	❌
Hierarchical structure	✅	❌	✅	❌	❌
Multi-hop reasoning	✅✅	✅	✅	✅	❌
Beam search	✅	❌	❌	❌	❌
Fact-level precision	✅✅	❌	❌	❌	❌
Cost	High	Medium	High	Low	Low
Best for	Complex reasoning	General use	Communities	Fast/cheap	Prototyping

Tips and Best Practices

1. Choose the Right Mode

typescript

// Simple entity questions → traversal
await graph.query("What is X?", { mode: "traversal" });

// Multi-hop reasoning → semantic
await graph.query("How does A connect to C?", { mode: "semantic" });

2. Tune Beam Search

Start conservative, then increase:

typescript

// Conservative (fast)
semantic: { beamWidth: 3, maxPaths: 5 }

// Balanced (default)
semantic: { beamWidth: 5, maxPaths: 10 }

// Thorough (slower)
semantic: { beamWidth: 10, maxPaths: 20 }

3. Use Domain Context

Help extraction with domain description:

typescript

domain: "Scientific research papers on cancer genomics",
exampleQueries: [
  "Which genes are implicated in breast cancer?",
  "How do TP53 mutations affect treatment?",
]

4. Monitor Fact Quality

Inspect extracted facts:

typescript

const entities = await graph.entities.list();
const relations = await graph.relations.list();

console.log("Entities:", entities.length);
console.log("Facts:", relations.length);

5. Use Reranking Wisely

typescript

// Fast queries → TF-IDF
traversal: { reranker: "tfidf" }

// High-quality results → model-based
traversal: { reranker: "model" }

Beam Search Explained

Beam search explores multiple paths through the fact graph in parallel:

Query: "How is A connected to D?"

Initial: [A]
   ↓
Step 1: [B₁, B₂, B₃, B₄, B₅]  (top-5 by score)
   ↓
Step 2: [C₁, C₂, C₃, C₄, C₅]  (expand each B, keep top-5)
   ↓
Step 3: [D₁, D₂, D₃]          (found target, return paths)

Result: Multiple paths from A → D

Parameters:

Beam width: How many paths to keep at each step
Max paths: Maximum total paths to return
Diversity weight: Balance between relevance and variety

AWS GraphRAG ​

Overview ​

How It Works ​

Graph Structure ​

1. Insert Pipeline ​

2. Query Pipelines ​

Traversal Mode ​

Semantic Mode ​

Installation ​

Basic Usage ​

Configuration ​

Entity Types ​

Traversal Configuration ​

Semantic Configuration ​

Query Modes ​

Traversal Mode (Default) ​

Semantic Mode ​

Query Options ​

Hierarchical Graph Structure ​

Top-Down Retrieval ​

Bottom-Up Retrieval ​

Fact-Level Traversal ​

Advantages ​

1. Multi-Hop Reasoning ​

2. Cross-Document Connections ​

3. Fact-Level Precision ​

4. Flexible Retrieval ​

When to Use AWS GraphRAG ​

Complete Example ​

Comparison with Other Algorithms ​

Tips and Best Practices ​

1. Choose the Right Mode ​

2. Tune Beam Search ​

3. Use Domain Context ​

4. Monitor Fact Quality ​

5. Use Reranking Wisely ​

Beam Search Explained ​

See Also ​

AWS GraphRAG

Overview

How It Works

Graph Structure

1. Insert Pipeline

2. Query Pipelines

Traversal Mode

Semantic Mode

Installation

Basic Usage

Configuration

Entity Types

Traversal Configuration

Semantic Configuration

Query Modes

Traversal Mode (Default)

Semantic Mode

Query Options

Hierarchical Graph Structure

Top-Down Retrieval

Bottom-Up Retrieval

Fact-Level Traversal

Advantages

1. Multi-Hop Reasoning

2. Cross-Document Connections

3. Fact-Level Precision

4. Flexible Retrieval

When to Use AWS GraphRAG

Complete Example

Comparison with Other Algorithms

Tips and Best Practices

1. Choose the Right Mode

2. Tune Beam Search

3. Use Domain Context

4. Monitor Fact Quality

5. Use Reranking Wisely

Beam Search Explained

See Also