AI Scraper is a cutting-edge tool designed for data extraction from various web sources. It automates complex scraping tasks, enabling users to create efficient workflows for scalable AI scraping operations.
claude install oxylabs/ai-scraper-pyAI Scraper is a cutting-edge tool designed for data extraction from various web sources. It automates complex scraping tasks, enabling users to create efficient workflows for scalable AI scraping operations.
[{"step":"Install and configure ai-scraper-py","action":"Run `pip install ai-scraper-py` and set up environment variables for proxies (if needed) and headers to mimic a real browser. Configure the tool to handle JavaScript-rendered content by enabling a headless browser (e.g., Playwright).","tip":"Use `--headless-mode` for faster scraping or `--browser-mode` if the target site requires JavaScript execution."},{"step":"Define the scraping target and data fields","action":"Specify the target URL (e.g., `https://example.com/products`) and the data fields to extract (e.g., `[product_name, price, availability]`). Use the tool’s `--selectors` flag to map HTML elements to fields via CSS selectors or XPath.","tip":"Inspect the target page using browser dev tools to identify stable selectors (e.g., `#product-title`, `.price`). Avoid selectors with dynamic IDs like `data-testid-12345`."},{"step":"Handle pagination and anti-scraping measures","action":"Configure pagination by setting `--max-pages 5` and use `--delay 2` to avoid rate-limiting. Enable `--proxy-rotation` if the site blocks your IP. For CAPTCHAs, integrate a solver service (e.g., 2Captcha) via `--captcha-solver`.","tip":"Test the scraper on a single page first to validate selectors and error handling before scaling up."},{"step":"Output and process the extracted data","action":"Save the output to a file using `--output-format json --output-file products.json`. Validate the data with `--validate` to ensure all required fields are present. For large datasets, stream the output to a database (e.g., PostgreSQL) using `--db-connection-string`.","tip":"Use `--log-level debug` to troubleshoot issues like failed requests or missing data."},{"step":"Automate and scale the scraping workflow","action":"Schedule the scraper using cron (Linux/macOS) or Task Scheduler (Windows) to run at set intervals. For enterprise use, deploy the scraper on a cloud platform (e.g., AWS Lambda) with `--cloud-mode` to handle high volumes.","tip":"Monitor scraping performance with `--metrics` to track success rates, errors, and execution time."}]
Extracting product data from e-commerce sites
Gathering competitor pricing information
Automating lead generation from social media
Collecting market research data from multiple sources
claude install oxylabs/ai-scraper-pygit clone https://github.com/oxylabs/ai-scraper-pyCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Use ai-scraper-py to extract structured data from [WEBSITE_URL] for [USE_CASE]. Focus on extracting [SPECIFIC_DATA_FIELDS] while handling dynamic content, pagination, and anti-scraping measures. Output the data in [OUTPUT_FORMAT] (e.g., CSV, JSON). Include error handling for failed requests or CAPTCHAs. Example: 'Use ai-scraper-py to extract product listings from https://example.com/electronics for a price comparison tool. Extract product name, price, availability, and ratings. Handle pagination up to 5 pages and output as JSON with error logging for failed requests.'
```json
{
"extracted_data": [
{
"product_name": "Wireless Bluetooth Headphones",
"price": 79.99,
"availability": "In Stock",
"rating": 4.5,
"url": "https://example.com/electronics/headphones/12345",
"last_updated": "2024-05-20T14:30:00Z"
},
{
"product_name": "Smart LED Desk Lamp",
"price": 49.99,
"availability": "Out of Stock",
"rating": 3.8,
"url": "https://example.com/electronics/lamps/67890",
"last_updated": "2024-05-19T09:15:00Z"
}
],
"metadata": {
"total_items": 2,
"pages_scraped": 1,
"errors": [],
"scraping_timestamp": "2024-05-20T15:00:00Z"
}
}
```
**Scraping Summary:**
- Successfully scraped 2 product listings from https://example.com/electronics in 4.2 seconds.
- No errors encountered during the process. The website did not trigger anti-scraping measures (e.g., CAPTCHA or rate-limiting).
- The output is structured JSON, ready for integration into a price comparison tool or database.
**Key Observations:**
1. The product listings include all requested fields (name, price, availability, rating, URL).
2. The `last_updated` field ensures data freshness for downstream applications.
3. The `metadata` section provides traceability for debugging or auditing purposes.
**Next Steps:**
- Schedule this script to run daily at 10 AM UTC to keep the dataset current.
- Integrate the output with a database (e.g., PostgreSQL) using the provided JSON structure.
- For larger datasets, consider parallelizing the scraping process across multiple pages or domains.
Your one-stop shop for church and ministry supplies.
Automate your browser workflows effortlessly
Enterprise RPA platform with AI-powered process automation
No-code SaaS integration and data sync
Fast and reliable CNC machining services.
Marketing automation and CRM for SMBs
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan