Supacrawler Docs
API Reference

Scrape

Extract clean, structured content from any webpage with our powerful scraping API. Get markdown, HTML, or discover links with metadata for enhanced site mapping.

Quick Example

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://example.com" \
  -d format="markdown"
from supacrawler import SupacrawlerClient
import os

client = SupacrawlerClient(api_key=os.environ.get('SUPACRAWLER_API_KEY'))
result = client.scrape("https://example.com", format="markdown")
print(f"Title: {result.metadata.title}")
print(f"Content: {result.markdown[:100]}...")
import { SupacrawlerClient } from '@supacrawler/js'

const client = new SupacrawlerClient({ apiKey: process.env.SUPACRAWLER_API_KEY })
const result = await client.scrape({ 
  url: 'https://example.com', 
  format: 'markdown' 
})
console.log('Title:', result.content?.metadata?.title)
Response
{
  "success": true,
  "url": "https://example.com",
  "content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
  "title": "Example Domain",
  "links": [
    "https://www.iana.org/domains/example"
  ],
  "discovered": 1,
  "metadata": {
    "status_code": 200
  }
}

Scrape a webpage

Endpoint

GET /v1/scrape

Extract content from any publicly accessible webpage. Content is cleaned to remove ads, navigation, and non-essential elements.

Parameters

Prop

Type

Request

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://example.com" \
  -d format="markdown" \
  -d render_js=false
import requests

response = requests.get(
    "https://api.supacrawler.com/api/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={
        "url": "https://example.com",
        "format": "markdown",
        "render_js": False
    }
)
print(response.json())
const params = new URLSearchParams({
  url: 'https://example.com',
  format: 'markdown',
  render_js: 'false'
});

const response = await fetch(
  `https://api.supacrawler.com/api/v1/scrape?${params}`,
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);

const result = await response.json();
console.log(result);

Response

{
  "success": true,
  "url": "https://example.com",
  "content": "# Example Domain\n\nThis domain is for use...",
  "title": "Example Domain",
  "links": ["https://www.iana.org/domains/example"],
  "discovered": 1,
  "metadata": {
    "status_code": 200
  }
}

Endpoint

GET /v1/scrape?format=links

Map all links on a webpage for site mapping and link analysis.

Parameters

Prop

Type

Request

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://example.com" \
  -d format="links" \
  -d depth=2
import requests

response = requests.get(
    "https://api.supacrawler.com/api/v1/scrape",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={
        "url": "https://example.com",
        "format": "links",
        "depth": 2
    }
)
print(response.json())
const params = new URLSearchParams({
  url: 'https://example.com',
  format: 'links',
  depth: '2'
});

const response = await fetch(
  `https://api.supacrawler.com/api/v1/scrape?${params}`,
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);

const result = await response.json();
console.log(result);

Response

{
  "success": true,
  "url": "https://example.com",
  "links": [
    "https://example.com/about",
    "https://example.com/contact"
  ],
  "discovered": 2,
  "metadata": {
    "status_code": 200,
    "depth": 2
  }
}

Best Practices

JavaScript-Heavy Sites

For SPAs and JavaScript-heavy sites, enable render_js:

curl -G https://api.supacrawler.com/api/v1/scrape \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d url="https://spa-example.com" \
  -d render_js=true \
  --data-urlencode 'wait_for_selectors=["[data-testid=\"content\"]"]'

Caching Behavior

  • Regular scrapes: Cached for 5 minutes
  • Render scrapes: Cached for 15 minutes
  • Fresh parameter: Use fresh=true to bypass cache

Response Model

Prop

Type

Was this page helpful?