Supabase Vectors (pgvector)
This example shows how to:
- Scrape clean markdown content with the Supacrawler Python SDK
- Generate embeddings (OpenAI shown; bring your own model)
- Store and index vectors using the official Supabase Python client (Vecs) and pgvector
- Query similar documents by semantic similarity
Note: Install the SDK and set credentials first. See Install the SDKs. Also review Supabase AI & Vectors guidance and examples: AI & Vectors.
Prerequisites
- Supabase project with
pgvector
enabled - A connection string for your project (prefer the pooled connection when running in hosted notebooks)
Enable pgvector in your Supabase project by following the docs: pgvector extension. For production, also create a vector index (HNSW or IVFFlat) per Supabase guidance.
Self‑hosted Postgres:
create extension if not exists vector;
Install (Python)
pip install supacrawler-py openai vecs
End‑to‑end Python example (Vecs client)
import os
import vecs # Supabase Python client for vectors
from supacrawler import SupacrawlerClient, ScrapeParams
from openai import OpenAI
DB_URL = os.environ['DATABASE_URL'] # postgresql+psycopg or postgresql URL
SUPACRAWLER_API_KEY = os.environ['SUPACRAWLER_API_KEY']
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
# 1) Scrape
crawler = SupacrawlerClient(api_key=SUPACRAWLER_API_KEY)
scrape = crawler.scrape(ScrapeParams(url='https://example.com', format='markdown'))
# 2) Embed
client = OpenAI(api_key=OPENAI_API_KEY)
emb = client.embeddings.create(model='text-embedding-3-small', input=scrape.content)
vector = emb.data[0].embedding
# 3) Upsert via Vecs (auto-creates metadata schema; create indexes separately)
vx = vecs.create_client(DB_URL)
col = vx.get_or_create_collection(name='documents', dimension=1536, metadata={'hnsw:space': 'cosine'})
col.upsert(records=[(
scrape.url, # id
vector, # embedding
{ # metadata
'url': scrape.url,
'title': getattr(scrape, 'title', None),
'content': scrape.content,
},
)])
print('Upserted 1 document')
Expected output
Upserted 1 document
Similarity search (Vecs)
-- Create an HNSW index (recommended) once per collection/table
create index if not exists documents_embedding_hnsw on vecs.items using hnsw (vector);
from openai import OpenAI
import vecs, os
DB_URL = os.environ['DATABASE_URL']
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
q = client.embeddings.create(model='text-embedding-3-small', input='How do I integrate with the API?')
qvec = q.data[0].embedding
vx = vecs.create_client(DB_URL)
col = vx.get_collection('documents')
matches = col.query(data=qvec, limit=5)
for m in matches:
print(m.id, m.score, m.metadata.get('title'))
JavaScript example (supabase-js)
import { createClient } from '@supabase/supabase-js'
import { SupacrawlerClient } from '@supacrawler/js'
import OpenAI from 'openai'
const supabase = createClient(process.env.SUPABASE_URL!, process.env.SUPABASE_SERVICE_ROLE_KEY!)
const crawler = new SupacrawlerClient({ apiKey: process.env.SUPACRAWLER_API_KEY! })
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! })
// 1) Scrape content
const scrape = await crawler.scrape({ url: 'https://example.com', format: 'markdown' })
// 2) Embed
const emb = await openai.embeddings.create({ model: 'text-embedding-3-small', input: scrape.content })
const embedding = emb.data[0].embedding
// 3) Store embedding (ensure a vector(1536) column and index exist per Supabase docs)
const { error } = await supabase.from('documents').insert({
url: scrape.url,
title: scrape.title ?? null,
content: scrape.content,
embedding,
})
if (error) throw error
// Tip: see Supabase AI docs on vector columns and indexes
Crawl and embed a whole site (Python)
Use the Supacrawler Jobs API to crawl a domain, then embed and upsert all pages with Vecs.
import os, vecs
from supacrawler import SupacrawlerClient, JobCreateRequest
from openai import OpenAI
DB_URL = os.environ['DATABASE_URL']
SUPACRAWLER_API_KEY = os.environ['SUPACRAWLER_API_KEY']
OPENAI_API_KEY = os.environ['OPENAI_API_KEY']
crawler = SupacrawlerClient(api_key=SUPACRAWLER_API_KEY)
# 1) Create crawl job (scope with patterns)
job = crawler.create_job(JobCreateRequest(
url='https://docs.supacrawler.com',
type='crawl',
depth=2,
link_limit=50,
patterns=['/'],
render_js=False,
))
status = crawler.wait_for_job(job.job_id)
# 2) Embed + upsert all pages
vx = vecs.create_client(DB_URL)
col = vx.get_or_create_collection(name='site_docs', dimension=1536)
client = OpenAI(api_key=OPENAI_API_KEY)
records = []
for page_url, page in (status.data.crawl_data or {}).items():
content = page.markdown or ''
if not content:
continue
emb = client.embeddings.create(model='text-embedding-3-small', input=content)
vector = emb.data[0].embedding
records.append((page_url, vector, {
'url': page_url,
'title': (page.metadata or {}).get('title'),
'content': content[:1000]
}))
if records:
col.upsert(records=records)
print(f'Upserted {len(records)} pages')
else:
print('No pages to upsert')
Expected output
Upserted crawl chunks: 19
Q: What does the scrape endpoint do?
Top 3 matches:
https://docs.supacrawler.com/api/scrape#chunk-0 n/a Scrape - Supacrawler API Reference
https://docs.supacrawler.com/api/scrape#chunk-3 n/a Scrape - Supacrawler API Reference
https://docs.supacrawler.com/quickstart#chunk-1 n/a Quickstart - Supacrawler API Reference
Notebook
You can run the full workflow in a notebook: supacrawler-py/examples/supabase_vectors.ipynb.