Skip to content

PostgreSQL + pgvector Storage

The @graphrag-js/pgvector package provides vector storage using PostgreSQL with the pgvector extension, ideal for SQL-based workflows.

Installation

bash
pnpm add @graphrag-js/pgvector

Features

  • SQL-Based - Familiar PostgreSQL database
  • ACID Transactions - Data consistency guarantees
  • JSONB Metadata - Rich filtering capabilities
  • IVFFlat Indexing - Fast approximate search
  • Horizontal Scaling - Sharding and replication
  • Unified Storage - Combine with other PostgreSQL data

Prerequisites

PostgreSQL + pgvector

Option 1: Docker (Recommended)

bash
docker run -d \
  --name postgres \
  -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=graphrag \
  -p 5432:5432 \
  -v pgdata:/var/lib/postgresql/data \
  pgvector/pgvector:pg16

Option 2: Install pgvector on Existing PostgreSQL

bash
# macOS (Homebrew)
brew install pgvector

# Ubuntu/Debian
sudo apt install postgresql-16-pgvector

# From source
cd /tmp
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make
sudo make install

Then enable the extension:

sql
CREATE EXTENSION vector;

Option 3: Hosted PostgreSQL

  • Supabase - pgvector enabled by default
  • Neon - Enable in project settings
  • AWS RDS - Install pgvector extension

Verify Installation

sql
SELECT * FROM pg_extension WHERE extname = 'vector';

Quick Start

typescript
import { createGraph } from '@graphrag-js/core';
import { lightrag } from '@graphrag-js/lightrag';
import { pgVector } from '@graphrag-js/pgvector';
import { openai } from '@ai-sdk/openai';

const graph = createGraph({
  model: openai('gpt-4o-mini'),
  embedding: openai.embedding('text-embedding-3-small'),
  provider: lightrag(),
  storage: {
    vector: pgVector({
      host: 'localhost',
      port: 5432,
      database: 'graphrag',
      user: 'postgres',
      password: 'password',
    }),
  }
});

await graph.insert('Your documents...');
const result = await graph.query('Your question?');

Configuration

pgVector(config)

typescript
interface PgVectorConfig {
  host?: string;      // PostgreSQL host (default: 'localhost')
  port?: number;      // PostgreSQL port (default: 5432)
  database?: string;  // Database name (default: 'graphrag')
  user?: string;      // Database user
  password?: string;  // Database password
}

Connection Examples

typescript
// Local instance
pgVector({
  host: 'localhost',
  port: 5432,
  database: 'graphrag',
  user: 'postgres',
  password: 'password',
})

// Connection string
pgVector({
  connectionString: 'postgresql://user:password@localhost:5432/graphrag',
})

// Supabase
pgVector({
  host: 'db.xxxyyy.supabase.co',
  port: 5432,
  database: 'postgres',
  user: 'postgres',
  password: 'your-supabase-password',
})

// SSL connection
pgVector({
  host: 'production.example.com',
  port: 5432,
  database: 'graphrag',
  user: 'app',
  password: 'secure-password',
  ssl: {
    rejectUnauthorized: false,
  },
})

Usage Examples

Basic Vector Operations

typescript
import { pgVector } from '@graphrag-js/pgvector';

const vectorStore = pgVector({
  host: 'localhost',
  port: 5432,
  database: 'graphrag',
  user: 'postgres',
  password: 'password',
})('my-namespace');

// Create index
await vectorStore.createIndex({
  indexName: 'documents',
  dimension: 1536,
  metric: 'cosine',
});

// Upsert vectors
await vectorStore.upsert({
  indexName: 'documents',
  vectors: [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  metadata: [
    { text: 'Document 1', category: 'tech' },
    { text: 'Document 2', category: 'science' },
  ],
  ids: ['doc-1', 'doc-2'],
});

// Query with metadata filtering
const results = await vectorStore.query({
  queryVector: [0.15, 0.25, ...],
  topK: 10,
  filter: { category: 'tech' },
});

SQL Queries

Direct SQL access for advanced use cases:

typescript
import { Pool } from 'pg';
import pgvector from 'pgvector/pg';

const pool = new Pool({
  host: 'localhost',
  port: 5432,
  database: 'graphrag',
  user: 'postgres',
  password: 'password',
});

const client = await pool.connect();
await pgvector.registerType(client);

// Vector similarity search
const result = await client.query(`
  SELECT id, metadata, vector <=> $1::vector AS distance
  FROM my_namespace_documents
  WHERE metadata->>'category' = 'tech'
  ORDER BY vector <=> $1::vector
  LIMIT 10
`, [JSON.stringify(queryVector)]);

client.release();

JSONB Metadata Filtering

typescript
// Complex metadata queries
const results = await client.query(`
  SELECT id, metadata
  FROM my_namespace_documents
  WHERE
    metadata->>'category' = 'tech'
    AND (metadata->>'year')::int >= 2023
    AND metadata->'tags' ? 'ai'
  ORDER BY vector <=> $1::vector
  LIMIT 10
`, [vectorStr]);

Distance Operators

pgvector provides three distance operators:

OperatorMetricBest For
<->L2 (Euclidean)Non-normalized vectors
<=>CosineNormalized vectors (default)
<#>Inner productDot product similarity
sql
-- Cosine distance (default)
ORDER BY vector <=> query_vector

-- L2 distance
ORDER BY vector <-> query_vector

-- Inner product
ORDER BY vector <#> query_vector

Indexing

IVFFlat Index

For fast approximate similarity search:

sql
-- Create IVFFlat index
CREATE INDEX ON my_namespace_documents
USING ivfflat (vector vector_cosine_ops)
WITH (lists = 100);

-- Available operators
-- vector_l2_ops      for L2 distance (<->)
-- vector_ip_ops      for inner product (<#>)
-- vector_cosine_ops  for cosine distance (<=>)

Index Parameters

typescript
// Optimal lists = rows / 1000 (for IVFFlat)
const rows = 100000;
const lists = Math.max(10, Math.floor(rows / 1000));

await client.query(`
  CREATE INDEX ON my_namespace_documents
  USING ivfflat (vector vector_cosine_ops)
  WITH (lists = ${lists})
`);

Search Tuning

sql
-- Increase search accuracy (default: 10)
SET ivfflat.probes = 20;

-- Query with custom probes
BEGIN;
SET LOCAL ivfflat.probes = 50;
SELECT * FROM my_namespace_documents
ORDER BY vector <=> query_vector
LIMIT 10;
COMMIT;

Performance Optimization

Connection Pooling

typescript
import { Pool } from 'pg';

const pool = new Pool({
  host: 'localhost',
  port: 5432,
  database: 'graphrag',
  user: 'postgres',
  password: 'password',
  max: 20,                    // Max connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

Batch Inserts

typescript
// Use COPY for bulk loading
await client.query('BEGIN');

const copyStream = client.query(
  copyFrom('COPY my_namespace_documents (id, vector, metadata) FROM STDIN')
);

for (const row of rows) {
  copyStream.write(`${row.id}\t${row.vector}\t${JSON.stringify(row.metadata)}\n`);
}

copyStream.end();
await client.query('COMMIT');

Vacuum and Analyze

sql
-- Reclaim space and update statistics
VACUUM ANALYZE my_namespace_documents;

-- After bulk inserts
VACUUM (ANALYZE) my_namespace_documents;

Production Deployment

Docker Compose

yaml
version: '3.8'
services:
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: graphrag
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secure-password
      POSTGRES_INITDB_ARGS: "-E UTF8"
    volumes:
      - pgdata:/var/lib/postgresql/data
    shm_size: 256mb

volumes:
  pgdata:

Memory Configuration

For production workloads:

bash
# postgresql.conf
shared_buffers = 4GB           # 25% of RAM
effective_cache_size = 12GB    # 75% of RAM
maintenance_work_mem = 1GB     # For index creation
work_mem = 256MB               # For query operations

Backup Strategy

bash
# Full backup
pg_dump -U postgres -d graphrag > backup.sql

# Table-specific backup
pg_dump -U postgres -d graphrag -t my_namespace_documents > vectors.sql

# Restore
psql -U postgres -d graphrag < backup.sql

Monitoring

Table Size

sql
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename LIKE 'my_namespace%'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Index Usage

sql
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
WHERE tablename LIKE 'my_namespace%';

Query Performance

sql
-- Enable timing
\timing on

-- Explain query plan
EXPLAIN ANALYZE
SELECT id, vector <=> $1::vector AS distance
FROM my_namespace_documents
ORDER BY vector <=> $1::vector
LIMIT 10;

Troubleshooting

Extension Not Found

Error: extension "vector" is not available

Solution:

  1. Install pgvector: sudo apt install postgresql-16-pgvector
  2. Restart PostgreSQL
  3. Run: CREATE EXTENSION vector;

Dimension Mismatch

Error: vector must have X dimensions, not Y

Solution:

  • Ensure all vectors have consistent dimensions
  • OpenAI text-embedding-3-small: 1536
  • OpenAI text-embedding-3-large: 3072

Slow Queries

Problem: Vector search is slow

Solution:

  1. Create IVFFlat index
  2. Increase ivfflat.probes for accuracy
  3. Add metadata indexes: CREATE INDEX ON table ((metadata->>'field'))
  4. Use EXPLAIN ANALYZE to identify bottlenecks

Cost Considerations

DeploymentCostBest For
Self-hostedFree + hostingFull control
Supabase$25+/monthManaged PostgreSQL
Neon$19+/monthServerless PostgreSQL
AWS RDS$50+/monthEnterprise
Azure PostgreSQL$40+/monthAzure ecosystem

Benchmarks

On 1M vectors (1536-dim, 8GB RAM):

OperationLatency
Insert10-20ms
Search (k=10, no index)500-1000ms
Search (k=10, IVFFlat)20-50ms

Next Steps

Released under the Elastic License 2.0.