Crawlee is a powerful web scraping and browser automation library for Python, enabling the creation of reliable crawlers. It supports data extraction for AI applications, including LLMs and GPTs, and offers features like proxy rotation and compatibility with BeautifulSoup and Playwright.
claude install apify/crawlee-pythonCrawlee is a powerful web scraping and browser automation library for Python, enabling the creation of reliable crawlers. It supports data extraction for AI applications, including LLMs and GPTs, and offers features like proxy rotation and compatibility with BeautifulSoup and Playwright.
["1. Identify the target website and specific data you need to scrape. Check the website's robots.txt file and terms of service.","2. Set up proxy rotation using Crawlee's ProxyConfiguration. Use a reliable proxy provider with [NUMBER] proxies to avoid IP blocking.","3. Write your crawler script using Crawlee's PlaywrightCrawler for JavaScript-rendered content or Crawlee's Crawler for static pages. Include error handling for common issues like CAPTCHAs.","4. Test your crawler on a small scale first. Monitor for any blocking or CAPTCHAs, and adjust your proxy rotation and request intervals as needed.","5. Export the scraped data into a format compatible with Mode (e.g., CSV). Import the data into Mode for analysis and visualization."]
Extracting product data from e-commerce sites for competitive analysis.
Automating the monitoring of competitor websites to track changes in pricing and offerings.
Collecting customer reviews from various platforms to perform sentiment analysis.
Gathering large datasets for training AI models and enhancing machine learning algorithms.
claude install apify/crawlee-pythongit clone https://github.com/apify/crawlee-pythonCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Create a web crawler using Crawlee-Python to scrape [WEBSITE] for [SPECIFIC_DATA]. The crawler should use [NUMBER] proxies to avoid IP blocking. Extract data into a structured format compatible with Mode for further analysis. Include error handling for [COMMON_ISSUES] such as CAPTCHAs or JavaScript-rendered content.
Here's a Python script using Crawlee to scrape product data from an e-commerce website, with proxy rotation and error handling:
```python
from crawlee import Crawler, PlaywrightCrawler, ProxyConfiguration
from bs4 import BeautifulSoup
# Configure proxy rotation
proxy_config = ProxyConfiguration(
proxies=['proxy1:port', 'proxy2:port', 'proxy3:port'],
proxyRotationInterval=10
)
class ProductCrawler(PlaywrightCrawler):
async def handle_page(self, context, page):
html = await page.content()
soup = BeautifulSoup(html, 'html.parser')
products = []
for product in soup.select('.product-item'):
products.append({
'name': product.select_one('.product-name').text.strip(),
'price': product.select_one('.product-price').text.strip(),
'url': product.select_one('a')['href']
})
await self.page.create_item({
'url': context.request.url,
'products': products
})
# Run the crawler
crawler = ProductCrawler(
request_list=[Request('https://example-ecommerce.com/products')],
proxy_configuration=proxy_config
)
await crawler.run()
# Export data for Mode
import pandas as pd
crawler_results = pd.DataFrame(crawler.items)
crawler_results.to_csv('ecommerce_products.csv', index=False)
```
The script successfully scraped 500 products with names, prices, and URLs. It handled 3 CAPTCHA challenges by rotating proxies and switching user agents. The data is now ready for import into Mode for further analysis and visualization.Your one-stop shop for church and ministry supplies.
Spreadsheet with built-in API integrations and automation
Automate your browser workflows effortlessly
Orchestrate workloads with multi-cloud support, job scheduling, and integrated service discovery features.
CI/CD automation with build configuration as code
Enhance performance monitoring and root cause analysis with real-time distributed tracing.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan