Local Development

Run Supacrawler locally for development, testing, or self-hosting. Perfect for developers who want full control over their scraping infrastructure.

Quick Start

Option 1: Docker (Recommended)

The fastest way to get Supacrawler running locally:

# Download and start with Docker Compose
curl -O https://raw.githubusercontent.com/supacrawler/supacrawler/main/docker-compose.yml
docker compose up

Your local Supacrawler instance will be available at http://localhost:8081

Option 2: Binary Installation

For advanced users who prefer native binaries:

  1. Download from GitHub releases
  2. Install dependencies: Redis + Node.js + Playwright v1.49.1
  3. Run: ./supacrawler --redis-addr=127.0.0.1:6379

Using SDKs with Local Instance

Once your local Supacrawler is running, you can use our SDKs by simply updating the baseUrl:

Python SDK

from supacrawler import SupacrawlerClient

# Point to your local instance
client = SupacrawlerClient(
    api_key='anything',  # API key not required for local
    base_url='http://localhost:8081/v1'
)

# Use normally
result = client.scrape({
    'url': 'https://example.com',
    'format': 'markdown'
})

print(result.data)

JavaScript/TypeScript SDK

import { SupacrawlerClient } from '@supacrawler/js'

// Point to your local instance
const client = new SupacrawlerClient({ 
    apiKey: 'anything',  // API key not required for local
    baseUrl: 'http://localhost:8081/v1' 
})

// Use normally
const result = await client.scrape({ 
    url: 'https://example.com', 
    format: 'markdown' 
})

console.log(result.data)

Direct HTTP/cURL

You can also make direct HTTP requests:

# Health check
curl http://localhost:8081/v1/health

# Scrape a webpage
curl "http://localhost:8081/v1/scrape?url=https://example.com&format=markdown"

# Take a screenshot
curl -X POST http://localhost:8081/v1/screenshots \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","full_page":true}'

Local Development Benefits

Cost Savings

  • No API costs for development and testing
  • Unlimited requests during development
  • No rate limits on your local instance

Privacy & Control

  • Data stays local - no external API calls
  • Full control over infrastructure and scaling
  • Custom configurations for specific needs

Development Speed

  • Instant feedback without network latency
  • Debug and iterate faster
  • Test edge cases without quota concerns

Configuration Options

Environment Variables

# Core settings
export HTTP_ADDR=":8081"
export REDIS_ADDR="127.0.0.1:6379"
export DATA_DIR="./data"

# Optional: Supabase integration
export SUPABASE_URL="your-supabase-url"
export SUPABASE_SERVICE_KEY="your-service-key"
export SUPABASE_STORAGE_BUCKET="screenshots"

Custom Configuration

Create a .env file in your project root:

HTTP_ADDR=:8081
REDIS_ADDR=127.0.0.1:6379
DATA_DIR=./data
REDIS_PASSWORD=your-redis-password

Hot Reload Development

For active development with automatic reloading:

# Install Air for hot reloading
go install github.com/air-verse/air@latest

# Set environment variables
export REDIS_ADDR=127.0.0.1:6379
export HTTP_ADDR=:8081

# Run with hot reload
air

Troubleshooting

JavaScript Rendering Issues

If you encounter "please install the driver" errors:

# Install Playwright dependencies
npm install -g playwright
playwright install chromium --with-deps

Redis Connection Issues

Make sure Redis is running:

# macOS with Homebrew
brew services start redis

# Docker
docker run -d --name redis -p 6379:6379 redis:7-alpine

# Ubuntu/Debian
sudo systemctl start redis-server

Port Already in Use

If port 8081 is busy, change the port:

export HTTP_ADDR=":8082"
./supacrawler

Next Steps

Production Deployment

Ready to deploy? Check out our production deployment guide or use our managed service at supacrawler.com - 63% cheaper than alternatives!

Was this page helpful?