Use Cases
DeepSeek AI Integration
Cost-effective AI content processing with DeepSeek and Supacrawler. Combine Supacrawler's web scraping with DeepSeek AI for cost-effective content analysis and processing.
APIs Used
This integration uses the Scrape API for content extraction and the Parse API for AI-powered data structuring.
Quick Example
from supacrawler import SupacrawlerClient
from openai import OpenAI
import os
supacrawler = SupacrawlerClient(api_key=os.environ['SUPACRAWLER_API_KEY'])
deepseek = OpenAI(
api_key=os.environ['DEEPSEEK_API_KEY'],
base_url="https://api.deepseek.com"
)
def scrape_and_analyze(url, analysis_prompt):
# Step 1: Scrape content
result = supacrawler.scrape(url, format="markdown")
# Step 2: Analyze with DeepSeek
response = deepseek.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "system", "content": "You are a helpful assistant that analyzes web content."},
{"role": "user", "content": f"Analyze this content from {url}:\n\n{result.content}\n\nFocus: {analysis_prompt}"}
],
max_tokens=2000
)
return {
'url': url,
'title': result.title,
'analysis': response.choices[0].message.content,
'cost': response.usage.total_tokens * 0.00014 / 1000 # DeepSeek pricing
}
result = scrape_and_analyze(
url="https://techcrunch.com/ai",
analysis_prompt="Summarize key AI trends and business opportunities"
)
print(f"Analysis: {result['analysis']}")
print(f"Cost: ${result['cost']:.4f}")
Batch Processing
def batch_analyze_urls(urls, prompt):
results = []
total_cost = 0
for url in urls:
result = scrape_and_analyze(url, prompt)
results.append(result)
total_cost += result['cost']
print(f"✅ Processed {url} (${result['cost']:.4f})")
print(f"\nTotal cost: ${total_cost:.4f}")
return results
urls = [
"https://techcrunch.com/article1",
"https://techcrunch.com/article2",
"https://techcrunch.com/article3"
]
analyses = batch_analyze_urls(urls, "Extract key insights and business implications")
Content Summarization
def summarize_website(url):
# Crawl entire site
job = supacrawler.create_crawl_job(
url=url,
depth=2,
link_limit=20
)
crawl_result = supacrawler.wait_for_crawl(job.job_id)
# Combine all content
all_content = "\n\n".join([
page.markdown
for url, page in crawl_result.data.crawl_data.items()
if hasattr(page, 'markdown')
])
# Summarize with DeepSeek
response = deepseek.chat.completions.create(
model="deepseek-chat",
messages=[{
"role": "user",
"content": f"Provide a comprehensive summary of this website:\n\n{all_content[:50000]}"
}]
)
return response.choices[0].message.content
Cost Comparison
Model | Input ($/1M tokens) | Output ($/1M tokens) | Use Case |
---|---|---|---|
DeepSeek Chat | $0.14 | $0.28 | General analysis |
GPT-4 Turbo | $10.00 | $30.00 | Complex reasoning |
GPT-3.5 Turbo | $0.50 | $1.50 | Simple tasks |
Best Practices
- Use DeepSeek for cost-sensitive applications
- Batch process multiple URLs to reduce overhead
- Cache scraped content to avoid re-scraping
- Monitor token usage for cost optimization
- Combine with Supabase for data storage
- Use streaming for real-time responses
Was this page helpful?