DeepSeek AI Integration

Combine Supacrawler web scraping with DeepSeek AI for cost-effective intelligent content processing. This guide shows you how to scrape content and process it through DeepSeek for analysis, summarization, and insights.

Prerequisites

Quick example

Scrape content and process it with DeepSeek AI:

import requests
import os
from supacrawler import SupacrawlerClient

# Initialize clients
supacrawler = SupacrawlerClient(api_key=os.environ['SUPACRAWLER_API_KEY'])
deepseek_api_key = os.environ['DEEPSEEK_API_KEY']

def scrape_and_analyze(url, analysis_prompt):
    """Scrape content and analyze with DeepSeek"""
    
    # Step 1: Scrape content with Supacrawler
    result = supacrawler.scrape(url, format="markdown")
    content = result.content
    
    # Step 2: Process with DeepSeek AI
    deepseek_response = requests.post(
        "https://api.deepseek.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {deepseek_api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "deepseek-chat",
            "messages": [
                {
                    "role": "system",
                    "content": "You are an expert content analyst. Analyze the provided web content according to the user's instructions."
                },
                {
                    "role": "user", 
                    "content": f"Content from {url}:\n\n{content}\n\nAnalysis request: {analysis_prompt}"
                }
            ],
            "temperature": 0.3
        }
    )
    
    analysis = deepseek_response.json()['choices'][0]['message']['content']
    
    return {
        'url': url,
        'content': content,
        'analysis': analysis,
        'title': result.title
    }

# Example usage
result = scrape_and_analyze(
    url="https://techcrunch.com/latest-ai-news",
    analysis_prompt="Summarize the key AI developments and identify emerging trends"
)

print(f"Analysis of {result['title']}:")
print(result['analysis'])

Automated content monitoring

Set up automated content monitoring with AI analysis:

import requests
import json

def setup_ai_content_monitoring():
    """Setup content monitoring with DeepSeek analysis"""
    
    content_sources = [
        {
            "name": "AI Research Papers",
            "url": "https://arxiv.org/list/cs.AI/recent",
            "analysis_prompt": "Identify breakthrough AI research and potential commercial applications"
        },
        {
            "name": "Competitor Blog",
            "url": "https://competitor.com/blog",
            "analysis_prompt": "Analyze product updates, strategy changes, and market positioning"
        },
        {
            "name": "Industry News",
            "url": "https://industry-news.com/latest",
            "analysis_prompt": "Extract key market trends and regulatory changes"
        }
    ]
    
    for source in content_sources:
        # Create Supacrawler watch job with webhook
        response = requests.post("https://api.supacrawler.com/api/v1/watch",
            headers={"Authorization": f"Bearer {SUPACRAWLER_API_KEY}"},
            json={
                "url": source["url"],
                "frequency": "daily",
                "selector": "article, .post, .content",
                "notification_preference": "changes_only",
                
                # Send to AI analysis webhook
                "webhook_url": "https://your-app.com/api/ai-analysis",
                "webhook_headers": {
                    "X-Source": source["name"],
                    "X-Analysis-Prompt": source["analysis_prompt"]
                },
                
                "include_html": True
            }
        )
        
        print(f"AI monitoring setup for {source['name']}")

# Webhook handler for AI analysis
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/api/ai-analysis', methods=['POST'])
def analyze_content_with_deepseek():
    """Process scraped content with DeepSeek AI"""
    
    data = request.json
    source_name = data.get('headers', {}).get('X-Source')
    analysis_prompt = data.get('headers', {}).get('X-Analysis-Prompt')
    
    # Extract content changes
    new_content = data.get('new_content', '')
    
    if new_content:
        # Analyze with DeepSeek
        analysis = analyze_with_deepseek(new_content, analysis_prompt)
        
        # Store results
        save_analysis_result({
            'source': source_name,
            'url': data.get('url'),
            'content': new_content,
            'analysis': analysis,
            'timestamp': data.get('timestamp')
        })
        
        # Send notifications for important insights
        if is_important_insight(analysis):
            send_priority_alert(source_name, analysis)
    
    return jsonify({'status': 'processed'})

def analyze_with_deepseek(content, prompt):
    """Send content to DeepSeek for analysis"""
    
    response = requests.post(
        "https://api.deepseek.com/v1/chat/completions",
        headers={
            "Authorization": f"Bearer {DEEPSEEK_API_KEY}",
            "Content-Type": "application/json"
        },
        json={
            "model": "deepseek-chat",
            "messages": [
                {"role": "system", "content": "You are an expert analyst. Provide concise, actionable insights."},
                {"role": "user", "content": f"Analyze this content: {content[:4000]}\n\nFocus: {prompt}"}
            ],
            "temperature": 0.2,
            "max_tokens": 1000
        }
    )
    
    return response.json()['choices'][0]['message']['content']

Use cases

Market research

  • Competitor analysis: Track competitor content and analyze strategic changes
  • Trend identification: Monitor industry sources and identify emerging trends
  • Customer sentiment: Analyze customer feedback and reviews across platforms
  • Product research: Track product announcements and feature updates

Content intelligence

  • News summarization: Automatically summarize news articles from multiple sources
  • Research synthesis: Combine information from academic papers and reports
  • Social media monitoring: Analyze social media content for brand mentions
  • Regulatory tracking: Monitor legal and compliance websites for changes

Best practices

  • Content chunking: Break large content into smaller pieces for better AI processing
  • Prompt optimization: Fine-tune analysis prompts for your specific use cases
  • Cost management: Use DeepSeek's cost-effective pricing for large-scale analysis
  • Quality control: Implement validation checks for AI-generated insights
  • Data storage: Store both raw content and analysis results for future reference

Was this page helpful?