Supacrawler Docs
SDKsTypeScript SDK

Scrape Endpoint

Extract content from web pages using the TypeScript SDK

Basic Usage

Markdown Scrape

import { SupacrawlerClient } from '@supacrawler/js'

const client = new SupacrawlerClient({ apiKey: 'YOUR_API_KEY' })

// Basic markdown scrape
const result = await client.scrape({
  url: 'https://example.com',
  format: 'markdown'
})

console.log('markdown:', result.markdown)
console.log('metadata:', result.metadata)

HTML Scrape with JS Rendering

// HTML scrape with JavaScript rendering
const html = await client.scrape({
  url: 'https://spa-example.com',
  format: 'markdown',
  render_js: true,
  include_html: true
})

console.log('markdown:', html.markdown)
console.log('html:', html.html)
// Extract links with depth and limit
const links = await client.scrape({
  url: 'https://example.com',
  format: 'links',
  depth: 2,
  max_links: 100
})

console.log('discovered links:', links.links)

Advanced Options

Fresh Content

Bypass cache to get the latest content:

const fresh = await client.scrape({
  url: 'https://news-site.com/article',
  format: 'markdown',
  fresh: true  // Skip cache
})

Custom Selectors

Target specific elements:

const result = await client.scrape({
  url: 'https://example.com',
  format: 'markdown',
  selector: '#main-content'  // Only extract this element
})

Wait for Content

const result = await client.scrape({
  url: 'https://dynamic-site.com',
  format: 'markdown',
  render_js: true,
  wait_for: 5000  // Wait 5 seconds
})

Complete Example

import { SupacrawlerClient, ScrapeResponse } from '@supacrawler/js'

async function main() {
  const client = new SupacrawlerClient({
    apiKey: process.env.SUPACRAWLER_API_KEY || 'YOUR_API_KEY'
  })

  try {
    // Comprehensive scrape
    const result = await client.scrape({
      url: 'https://supabase.com/docs',
      format: 'markdown',
      render_js: true,
      include_html: true,
      fresh: false,
      max_links: 100
    })

    console.log(`✅ scrape completed`)
    console.log(`content length: ${result.markdown?.length} characters`)
    console.log(`title: ${result.metadata?.title}`)
    console.log(`status: ${result.metadata?.status_code}`)

    // Save to file
    await Bun.write('output.md', result.markdown || '')
  } catch (error) {
    console.error('scrape failed:', error)
  }
}

main()

Response Structure

interface ScrapeResponse {
  markdown?: string
  html?: string
  links?: string[]
  metadata?: {
    title?: string
    description?: string
    language?: string
    status_code?: number
    source_url?: string
  }
}

Error Handling

try {
  const result = await client.scrape({
    url: 'https://example.com',
    format: 'markdown'
  })
  console.log(result.markdown)
} catch (error) {
  console.error('scrape failed:', error)
}

Next Steps

Was this page helpful?