DeepSeek AI Integration
Combine Supacrawler web scraping with DeepSeek AI for cost-effective intelligent content processing. This guide shows you how to scrape content and process it through DeepSeek for analysis, summarization, and insights.
Prerequisites
- Supacrawler API key
- DeepSeek API access
- Target websites for content analysis
Quick example
Scrape content and process it with DeepSeek AI:
import requests
import os
from supacrawler import SupacrawlerClient
# Initialize clients
supacrawler = SupacrawlerClient(api_key=os.environ['SUPACRAWLER_API_KEY'])
deepseek_api_key = os.environ['DEEPSEEK_API_KEY']
def scrape_and_analyze(url, analysis_prompt):
"""Scrape content and analyze with DeepSeek"""
# Step 1: Scrape content with Supacrawler
result = supacrawler.scrape(url, format="markdown")
content = result.content
# Step 2: Process with DeepSeek AI
deepseek_response = requests.post(
"https://api.deepseek.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {deepseek_api_key}",
"Content-Type": "application/json"
},
json={
"model": "deepseek-chat",
"messages": [
{
"role": "system",
"content": "You are an expert content analyst. Analyze the provided web content according to the user's instructions."
},
{
"role": "user",
"content": f"Content from {url}:\n\n{content}\n\nAnalysis request: {analysis_prompt}"
}
],
"temperature": 0.3
}
)
analysis = deepseek_response.json()['choices'][0]['message']['content']
return {
'url': url,
'content': content,
'analysis': analysis,
'title': result.title
}
# Example usage
result = scrape_and_analyze(
url="https://techcrunch.com/latest-ai-news",
analysis_prompt="Summarize the key AI developments and identify emerging trends"
)
print(f"Analysis of {result['title']}:")
print(result['analysis'])
Automated content monitoring
Set up automated content monitoring with AI analysis:
import requests
import json
def setup_ai_content_monitoring():
"""Setup content monitoring with DeepSeek analysis"""
content_sources = [
{
"name": "AI Research Papers",
"url": "https://arxiv.org/list/cs.AI/recent",
"analysis_prompt": "Identify breakthrough AI research and potential commercial applications"
},
{
"name": "Competitor Blog",
"url": "https://competitor.com/blog",
"analysis_prompt": "Analyze product updates, strategy changes, and market positioning"
},
{
"name": "Industry News",
"url": "https://industry-news.com/latest",
"analysis_prompt": "Extract key market trends and regulatory changes"
}
]
for source in content_sources:
# Create Supacrawler watch job with webhook
response = requests.post("https://api.supacrawler.com/api/v1/watch",
headers={"Authorization": f"Bearer {SUPACRAWLER_API_KEY}"},
json={
"url": source["url"],
"frequency": "daily",
"selector": "article, .post, .content",
"notification_preference": "changes_only",
# Send to AI analysis webhook
"webhook_url": "https://your-app.com/api/ai-analysis",
"webhook_headers": {
"X-Source": source["name"],
"X-Analysis-Prompt": source["analysis_prompt"]
},
"include_html": True
}
)
print(f"AI monitoring setup for {source['name']}")
# Webhook handler for AI analysis
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/api/ai-analysis', methods=['POST'])
def analyze_content_with_deepseek():
"""Process scraped content with DeepSeek AI"""
data = request.json
source_name = data.get('headers', {}).get('X-Source')
analysis_prompt = data.get('headers', {}).get('X-Analysis-Prompt')
# Extract content changes
new_content = data.get('new_content', '')
if new_content:
# Analyze with DeepSeek
analysis = analyze_with_deepseek(new_content, analysis_prompt)
# Store results
save_analysis_result({
'source': source_name,
'url': data.get('url'),
'content': new_content,
'analysis': analysis,
'timestamp': data.get('timestamp')
})
# Send notifications for important insights
if is_important_insight(analysis):
send_priority_alert(source_name, analysis)
return jsonify({'status': 'processed'})
def analyze_with_deepseek(content, prompt):
"""Send content to DeepSeek for analysis"""
response = requests.post(
"https://api.deepseek.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {DEEPSEEK_API_KEY}",
"Content-Type": "application/json"
},
json={
"model": "deepseek-chat",
"messages": [
{"role": "system", "content": "You are an expert analyst. Provide concise, actionable insights."},
{"role": "user", "content": f"Analyze this content: {content[:4000]}\n\nFocus: {prompt}"}
],
"temperature": 0.2,
"max_tokens": 1000
}
)
return response.json()['choices'][0]['message']['content']
Use cases
Market research
- Competitor analysis: Track competitor content and analyze strategic changes
- Trend identification: Monitor industry sources and identify emerging trends
- Customer sentiment: Analyze customer feedback and reviews across platforms
- Product research: Track product announcements and feature updates
Content intelligence
- News summarization: Automatically summarize news articles from multiple sources
- Research synthesis: Combine information from academic papers and reports
- Social media monitoring: Analyze social media content for brand mentions
- Regulatory tracking: Monitor legal and compliance websites for changes
Best practices
- Content chunking: Break large content into smaller pieces for better AI processing
- Prompt optimization: Fine-tune analysis prompts for your specific use cases
- Cost management: Use DeepSeek's cost-effective pricing for large-scale analysis
- Quality control: Implement validation checks for AI-generated insights
- Data storage: Store both raw content and analysis results for future reference
Related resources
- Scrape API Reference - Content extraction documentation
- AI Integration Guide - General AI integration patterns
- Content Analysis Examples - More implementation examples