API Reference
Scrape
Extract clean, structured content from any webpage with our powerful scraping API. Get markdown, HTML, or discover links with metadata for enhanced site mapping.
Tip
Use Cases: Perfect for competitor pricing monitoring, real estate monitoring, and stock market news tracking.
Quick Example
curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://example.com" \
-d format="markdown"
from supacrawler import SupacrawlerClient
import os
client = SupacrawlerClient(api_key=os.environ.get('SUPACRAWLER_API_KEY'))
result = client.scrape("https://example.com", format="markdown")
print(f"Title: {result.metadata.title}")
print(f"Content: {result.markdown[:100]}...")
import { SupacrawlerClient } from '@supacrawler/js'
const client = new SupacrawlerClient({ apiKey: process.env.SUPACRAWLER_API_KEY })
const result = await client.scrape({
url: 'https://example.com',
format: 'markdown'
})
console.log('Title:', result.content?.metadata?.title)
{
"success": true,
"url": "https://example.com",
"content": "# Example Domain\n\nThis domain is for use in illustrative examples...",
"title": "Example Domain",
"links": [
"https://www.iana.org/domains/example"
],
"discovered": 1,
"metadata": {
"status_code": 200
}
}
Scrape a webpage
Endpoint
GET /v1/scrape
Extract content from any publicly accessible webpage. Content is cleaned to remove ads, navigation, and non-essential elements.
Parameters
Prop
Type
Request
curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://example.com" \
-d format="markdown" \
-d render_js=false
import requests
response = requests.get(
"https://api.supacrawler.com/api/v1/scrape",
headers={"Authorization": "Bearer YOUR_API_KEY"},
params={
"url": "https://example.com",
"format": "markdown",
"render_js": False
}
)
print(response.json())
const params = new URLSearchParams({
url: 'https://example.com',
format: 'markdown',
render_js: 'false'
});
const response = await fetch(
`https://api.supacrawler.com/api/v1/scrape?${params}`,
{ headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
const result = await response.json();
console.log(result);
Response
{
"success": true,
"url": "https://example.com",
"content": "# Example Domain\n\nThis domain is for use...",
"title": "Example Domain",
"links": ["https://www.iana.org/domains/example"],
"discovered": 1,
"metadata": {
"status_code": 200
}
}
Link Discovery Mode
Endpoint
GET /v1/scrape?format=links
Map all links on a webpage for site mapping and link analysis.
Parameters
Prop
Type
Request
curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://example.com" \
-d format="links" \
-d depth=2
import requests
response = requests.get(
"https://api.supacrawler.com/api/v1/scrape",
headers={"Authorization": "Bearer YOUR_API_KEY"},
params={
"url": "https://example.com",
"format": "links",
"depth": 2
}
)
print(response.json())
const params = new URLSearchParams({
url: 'https://example.com',
format: 'links',
depth: '2'
});
const response = await fetch(
`https://api.supacrawler.com/api/v1/scrape?${params}`,
{ headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
const result = await response.json();
console.log(result);
Response
{
"success": true,
"url": "https://example.com",
"links": [
"https://example.com/about",
"https://example.com/contact"
],
"discovered": 2,
"metadata": {
"status_code": 200,
"depth": 2
}
}
Best Practices
JavaScript-Heavy Sites
For SPAs and JavaScript-heavy sites, enable render_js
:
curl -G https://api.supacrawler.com/api/v1/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-d url="https://spa-example.com" \
-d render_js=true \
--data-urlencode 'wait_for_selectors=["[data-testid=\"content\"]"]'
Caching Behavior
- Regular scrapes: Cached for 5 minutes
- Render scrapes: Cached for 15 minutes
- Fresh parameter: Use
fresh=true
to bypass cache
Response Model
Prop
Type
Was this page helpful?