Scrape Endpoint
Extract content from web pages using the TypeScript SDK
Basic Usage
Markdown Scrape
import { SupacrawlerClient } from '@supacrawler/js'
const client = new SupacrawlerClient({ apiKey: 'YOUR_API_KEY' })
// Basic markdown scrape
const result = await client.scrape({
url: 'https://example.com',
format: 'markdown'
})
console.log('markdown:', result.markdown)
console.log('metadata:', result.metadata)
HTML Scrape with JS Rendering
// HTML scrape with JavaScript rendering
const html = await client.scrape({
url: 'https://spa-example.com',
format: 'markdown',
render_js: true,
include_html: true
})
console.log('markdown:', html.markdown)
console.log('html:', html.html)
Links Extraction
// Extract links with depth and limit
const links = await client.scrape({
url: 'https://example.com',
format: 'links',
depth: 2,
max_links: 100
})
console.log('discovered links:', links.links)
Advanced Options
Fresh Content
Bypass cache to get the latest content:
const fresh = await client.scrape({
url: 'https://news-site.com/article',
format: 'markdown',
fresh: true // Skip cache
})
Custom Selectors
Target specific elements:
const result = await client.scrape({
url: 'https://example.com',
format: 'markdown',
selector: '#main-content' // Only extract this element
})
Wait for Content
const result = await client.scrape({
url: 'https://dynamic-site.com',
format: 'markdown',
render_js: true,
wait_for: 5000 // Wait 5 seconds
})
Complete Example
import { SupacrawlerClient, ScrapeResponse } from '@supacrawler/js'
async function main() {
const client = new SupacrawlerClient({
apiKey: process.env.SUPACRAWLER_API_KEY || 'YOUR_API_KEY'
})
try {
// Comprehensive scrape
const result = await client.scrape({
url: 'https://supabase.com/docs',
format: 'markdown',
render_js: true,
include_html: true,
fresh: false,
max_links: 100
})
console.log(`✅ scrape completed`)
console.log(`content length: ${result.markdown?.length} characters`)
console.log(`title: ${result.metadata?.title}`)
console.log(`status: ${result.metadata?.status_code}`)
// Save to file
await Bun.write('output.md', result.markdown || '')
} catch (error) {
console.error('scrape failed:', error)
}
}
main()
Response Structure
interface ScrapeResponse {
markdown?: string
html?: string
links?: string[]
metadata?: {
title?: string
description?: string
language?: string
status_code?: number
source_url?: string
}
}
Error Handling
try {
const result = await client.scrape({
url: 'https://example.com',
format: 'markdown'
})
console.log(result.markdown)
} catch (error) {
console.error('scrape failed:', error)
}
Next Steps
- Crawl Endpoint - Scrape multiple pages
- Screenshots Endpoint - Capture visuals
- Watch Endpoint - Monitor for changes
Was this page helpful?