Supabase Vectors (pgvector)

This example shows how to:

  • Scrape clean markdown content with the Supacrawler Python SDK
  • Generate embeddings (OpenAI shown; bring your own model)
  • Store and index vectors using the official Supabase Python client (Vecs) and pgvector
  • Query similar documents by semantic similarity

Note: Install the SDK and set credentials first. See Install the SDKs. Also review Supabase AI & Vectors guidance and examples: AI & Vectors.

Prerequisites

  • Supabase project with pgvector enabled
  • A connection string for your project (prefer the pooled connection when running in hosted notebooks)

Enable pgvector in your Supabase project by following the docs: pgvector extension. For production, also create a vector index (HNSW or IVFFlat) per Supabase guidance.

Self‑hosted Postgres:

create extension if not exists vector;

Install (Python)

pip install supacrawler-py openai vecs

End‑to‑end Python example (Vecs client)

import os
import vecs  # Supabase Python client for vectors
from supacrawler import SupacrawlerClient, ScrapeParams
from openai import OpenAI

DB_URL = os.environ['DATABASE_URL']  # postgresql+psycopg or postgresql URL
SUPACRAWLER_API_KEY = os.environ['SUPACRAWLER_API_KEY']
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

# 1) Scrape
crawler = SupacrawlerClient(api_key=SUPACRAWLER_API_KEY)
scrape = crawler.scrape(ScrapeParams(url='https://example.com', format='markdown'))

# 2) Embed
client = OpenAI(api_key=OPENAI_API_KEY)
emb = client.embeddings.create(model='text-embedding-3-small', input=scrape.content)
vector = emb.data[0].embedding

# 3) Upsert via Vecs (auto-creates metadata schema; create indexes separately)
vx = vecs.create_client(DB_URL)
col = vx.get_or_create_collection(name='documents', dimension=1536, metadata={'hnsw:space': 'cosine'})
col.upsert(records=[(
    scrape.url,  # id
    vector,      # embedding
    {            # metadata
        'url': scrape.url,
        'title': getattr(scrape, 'title', None),
        'content': scrape.content,
    },
)])
print('Upserted 1 document')

Expected output

Upserted 1 document

Similarity search (Vecs)

-- Create an HNSW index (recommended) once per collection/table
create index if not exists documents_embedding_hnsw on vecs.items using hnsw (vector);
from openai import OpenAI
import vecs, os

DB_URL = os.environ['DATABASE_URL']
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
q = client.embeddings.create(model='text-embedding-3-small', input='How do I integrate with the API?')
qvec = q.data[0].embedding

vx = vecs.create_client(DB_URL)
col = vx.get_collection('documents')
matches = col.query(data=qvec, limit=5)
for m in matches:
    print(m.id, m.score, m.metadata.get('title'))

JavaScript example (supabase-js)

import { createClient } from '@supabase/supabase-js'
import { SupacrawlerClient } from '@supacrawler/js'
import OpenAI from 'openai'

const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!)
const crawler = new SupacrawlerClient({ apiKey: process.env.SUPACRAWLER_API_KEY! })
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })

// 1) Scrape content
const scrape = await crawler.scrape({ url: 'https://example.com', format: 'markdown' })

// 2) Embed
const emb = await openai.embeddings.create({ model: 'text-embedding-3-small', input: scrape.content })
const embedding = emb.data[0].embedding

// 3) Store embedding (ensure a vector(1536) column and index exist per Supabase docs)
const { error } = await supabase.from('documents').insert({
  url: scrape.url,
  title: scrape.title ?? null,
  content: scrape.content,
  embedding,
})
if (error) throw error
// Tip: see Supabase AI docs on vector columns and indexes

Crawl and embed a whole site (Python)

Use the Supacrawler Jobs API to crawl a domain, then embed and upsert all pages with Vecs.

import os, vecs
from supacrawler import SupacrawlerClient, JobCreateRequest
from openai import OpenAI

DB_URL = os.environ['DATABASE_URL']
SUPACRAWLER_API_KEY = os.environ['SUPACRAWLER_API_KEY']
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

crawler = SupacrawlerClient(api_key=SUPACRAWLER_API_KEY)

# 1) Create crawl job (scope with patterns)
job = crawler.create_job(JobCreateRequest(
    url='https://docs.supacrawler.com',
    type='crawl',
    depth=2,
    link_limit=50,
    patterns=['/'],
    render_js=False,
))
status = crawler.wait_for_job(job.job_id)

# 2) Embed + upsert all pages
vx = vecs.create_client(DB_URL)
col = vx.get_or_create_collection(name='site_docs', dimension=1536)
client = OpenAI(api_key=OPENAI_API_KEY)

records = []
for page_url, page in (status.data.crawl_data or {}).items():
    content = page.markdown or ''
    if not content:
        continue
    emb = client.embeddings.create(model='text-embedding-3-small', input=content)
    vector = emb.data[0].embedding
    records.append((page_url, vector, {
        'url': page_url,
        'title': (page.metadata or {}).get('title'),
        'content': content[:1000]
    }))

if records:
    col.upsert(records=records)
    print(f'Upserted {len(records)} pages')
else:
    print('No pages to upsert')

Expected output

Upserted crawl chunks: 19

Q: What does the scrape endpoint do?
Top 3 matches:
https://docs.supacrawler.com/api/scrape#chunk-0 n/a Scrape - Supacrawler API Reference
https://docs.supacrawler.com/api/scrape#chunk-3 n/a Scrape - Supacrawler API Reference
https://docs.supacrawler.com/quickstart#chunk-1 n/a Quickstart - Supacrawler API Reference

Notebook

You can run the full workflow in a notebook: supacrawler-py/examples/supabase_vectors.ipynb.

Was this page helpful?