Supacrawler Docs

Google Gemini Integration

Integrate Supacrawler with Google Gemini AI for advanced content analysis and multimodal processing. Combine web scraping with AI for intelligent insights.

APIs Used

This integration uses the Scrape API for content extraction and the Parse API for AI-powered data structuring.

Quick Example

import requests
import os
import google.generativeai as genai
from supacrawler import SupacrawlerClient

supacrawler = SupacrawlerClient(api_key=os.environ['SUPACRAWLER_API_KEY'])
genai.configure(api_key=os.environ['GOOGLE_AI_API_KEY'])

def scrape_and_analyze(url, prompt):
    # Step 1: Scrape content
    result = supacrawler.scrape(url, format="markdown")
    
    # Step 2: Analyze with Gemini
    model = genai.GenerativeModel('gemini-1.5-flash')
    response = model.generate_content([
        f"Analyze this web content from {url}:",
        result.content,
        f"\nAnalysis focus: {prompt}"
    ])
    
    return {
        'url': url,
        'title': result.title,
        'analysis': response.text,
        'content': result.content
    }

result = scrape_and_analyze(
    url="https://techcrunch.com/ai",
    prompt="Identify key AI trends and business impact"
)

print(result['analysis'])

Multimodal Analysis

Analyze both content and screenshots:

def multimodal_analysis(url):
    # Scrape with screenshot
    result = supacrawler.scrape(url, include_screenshot=True)
    
    model = genai.GenerativeModel('gemini-1.5-flash')
    
    # Text analysis
    text_analysis = model.generate_content([
        "Analyze this content:",
        result.content
    ])
    
    # Image analysis (if screenshot available)
    if result.screenshot_url:
        screenshot = requests.get(result.screenshot_url).content
        visual_analysis = model.generate_content([
            "Analyze this webpage screenshot:",
            {"mime_type": "image/png", "data": screenshot}
        ])
        
        return {
            'text_analysis': text_analysis.text,
            'visual_analysis': visual_analysis.text
        }

Use Cases

  • Content summarization for research
  • Sentiment analysis of articles
  • Data extraction with AI parsing
  • Competitive analysis automation
  • Content quality scoring

Best Practices

  • Use Gemini Flash for speed, Pro for accuracy
  • Cache scraped content to reduce costs
  • Batch process multiple URLs
  • Stream responses for real-time UX
  • Combine with Supabase for storage

Was this page helpful?