Google Gemini Integration
Integrate Supacrawler with Google Gemini AI for advanced content analysis and multimodal processing. Combine web scraping with AI for intelligent insights.
APIs Used
This integration uses the Scrape API for content extraction and the Parse API for AI-powered data structuring.
Quick Example
import requests
import os
import google.generativeai as genai
from supacrawler import SupacrawlerClient
supacrawler = SupacrawlerClient(api_key=os.environ['SUPACRAWLER_API_KEY'])
genai.configure(api_key=os.environ['GOOGLE_AI_API_KEY'])
def scrape_and_analyze(url, prompt):
# Step 1: Scrape content
result = supacrawler.scrape(url, format="markdown")
# Step 2: Analyze with Gemini
model = genai.GenerativeModel('gemini-1.5-flash')
response = model.generate_content([
f"Analyze this web content from {url}:",
result.content,
f"\nAnalysis focus: {prompt}"
])
return {
'url': url,
'title': result.title,
'analysis': response.text,
'content': result.content
}
result = scrape_and_analyze(
url="https://techcrunch.com/ai",
prompt="Identify key AI trends and business impact"
)
print(result['analysis'])
Multimodal Analysis
Analyze both content and screenshots:
def multimodal_analysis(url):
# Scrape with screenshot
result = supacrawler.scrape(url, include_screenshot=True)
model = genai.GenerativeModel('gemini-1.5-flash')
# Text analysis
text_analysis = model.generate_content([
"Analyze this content:",
result.content
])
# Image analysis (if screenshot available)
if result.screenshot_url:
screenshot = requests.get(result.screenshot_url).content
visual_analysis = model.generate_content([
"Analyze this webpage screenshot:",
{"mime_type": "image/png", "data": screenshot}
])
return {
'text_analysis': text_analysis.text,
'visual_analysis': visual_analysis.text
}
Use Cases
- Content summarization for research
- Sentiment analysis of articles
- Data extraction with AI parsing
- Competitive analysis automation
- Content quality scoring
Best Practices
- Use Gemini Flash for speed, Pro for accuracy
- Cache scraped content to reduce costs
- Batch process multiple URLs
- Stream responses for real-time UX
- Combine with Supabase for storage
Was this page helpful?