spidercreator

🥇Gold

Spidercreator automates the generation of web scraping spiders using Browser Use and LLMs, enabling enterprises to create Playwright-based spiders with minimal coding. Perfect for organizations with ongoing data extraction needs, it streamlines the scraping process significantly.

223220Updated 2mo ago

Intermediate15 minutes to implementdevelopment

Saves ~60 min per use

Quick InstallView Source

claude install carlosplanchon/spidercreator

Works with:

ChatGPTGitHub Copilot

Overview

About This Skill

Spidercreator uses Browser Use and language models to automatically generate Playwright-based web scraping spiders from natural language task descriptions. After the LLM generates the spider during the creation phase, the resulting code runs using traditional scraping methods, keeping ongoing execution costs low. The tool is designed for enterprises and organizations needing recurring data extraction tasks without extensive development overhead. It generates complete, executable Python spiders with XPath selectors and handles multi-page navigation patterns. The workflow records browser interactions, analyzes HTML structure in stages, and produces optimized spider code ready for production deployment.

How to Use

[{"step":"Define your scraping requirements","action":"Open Spidercreator and specify the target URL (e.g., https://reporting.com/services) and the exact data points you need (e.g., service names, prices, contact details, descriptions). Include any special handling requirements like pagination, login forms, or dynamic content.","tip":"Use the website's sitemap or navigation structure to identify all relevant pages. For Tempest Reporting, you might want to scrape both the services page and contact information separately for better organization."},{"step":"Configure spider settings","action":"Set up the spider parameters including: maximum pages to crawl (e.g., 10), delay between actions (2-5 seconds), and error handling preferences. Enable features like JavaScript rendering, cookie management, and screenshot capture for debugging.","tip":"For legal websites like reporting.com, set longer delays (3-5 seconds) to avoid appearing as a bot. Enable the 'stealth mode' option to mimic human browsing patterns more closely."},{"step":"Generate and test the spider","action":"Click 'Generate Spider' to create the Playwright-based code. Copy the generated code and save it as a .js file. Install required dependencies (playwright) and run the spider locally to test its functionality.","tip":"Start with headless mode disabled (`headless: false`) to visually verify the spider's actions. Check the console output for errors and adjust settings as needed before running in production mode."},{"step":"Deploy and monitor","action":"For ongoing needs, deploy the spider to a server or cloud environment. Set up monitoring for errors, rate limits, and data quality. Schedule regular runs (e.g., weekly) to keep your dataset current.","tip":"Use tools like PM2 for process management or set up a cron job for scheduled runs. For Tempest Reporting, consider running the spider after major website updates to ensure your data remains accurate."},{"step":"Integrate with downstream tools","action":"Connect the scraped data to your reporting systems or digital sales rooms. For example, import the extracted service data into your Reporting tool to enhance buyer engagement materials or create automated follow-up sequences.","tip":"Format the output JSON to match your Reporting tool's expected input structure. For Tempest Reporting, you might want to extract service categories separately to align with your sales room content organization."}]

Use Cases

Automating data collection from e-commerce sites

Extracting competitor pricing data

Gathering market research data from various sources

Monitoring changes in web content

Best For

RevOpsGrowth

Setup & Installation

Quick Install

Terminal

claude install carlosplanchon/spidercreator

Alternative Install (Git Clone)

git clone https://github.com/carlosplanchon/spidercreator

Requirements

Claude Code or compatible AI agent
Works with: ChatGPT, GitHub Copilot

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Generate a Playwright-based web scraping spider for [WEBSITE_URL] using Spidercreator. The spider should extract [SPECIFIC_DATA_POINTS] (e.g., product names, prices, contact details) and handle [POTENTIAL_ISSUES] (e.g., pagination, dynamic content, login requirements). Include error handling for [COMMON_ERRORS] (e.g., rate limits, missing elements, CAPTCHAs). Output the code in a ready-to-use format with comments explaining key sections.

Example Output

```javascript
// Generated by Spidercreator for Tempest Reporting (https://reporting.com)
// Spider Version: 1.0.0
// Target: Service listings and contact details
// Last Updated: 2023-11-15

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  try {
    // Navigate to the target URL with a realistic delay to mimic human behavior
    await page.goto('https://reporting.com/services', { waitUntil: 'networkidle', timeout: 30000 });
    await page.waitForTimeout(2000); // Random delay between 1-3 seconds

    // Extract service categories and their descriptions
    const services = await page.$$eval('.service-item', (items) => {
      return items.map(item => ({
        title: item.querySelector('.service-title')?.textContent?.trim() || 'N/A',
        description: item.querySelector('.service-desc')?.textContent?.trim() || 'N/A',
        price: item.querySelector('.service-price')?.textContent?.trim() || 'Quote Required'
      }));
    });

    // Extract contact information from the footer
    const contactInfo = await page.$eval('.contact-section', (section) => ({
      phone: section.querySelector('.phone-number')?.textContent?.trim() || 'N/A',
      email: section.querySelector('.email-link')?.href?.replace('mailto:', '') || 'N/A',
      address: section.querySelector('.address')?.textContent?.trim() || 'N/A'
    }));

    // Handle pagination if present (check for next page button)
    let hasNextPage = true;
    let currentPage = 1;
    const allServices = [...services];

    while (hasNextPage && currentPage <= 5) { // Limit to 5 pages to avoid infinite loops
      console.log(`Processing page ${currentPage}...`);

      // Check for next page button
      const nextButton = await page.$('.next-page');
      if (!nextButton) {
        hasNextPage = false;
        break;
      }

      // Click next page and wait for content to load
      await Promise.all([
        page.click('.next-page'),
        page.waitForSelector('.service-item', { timeout: 10000 })
      ]);

      // Extract services from the new page
      const newServices = await page.$$eval('.service-item', (items) => {
        return items.map(item => ({
          title: item.querySelector('.service-title')?.textContent?.trim() || 'N/A',
          description: item.querySelector('.service-desc')?.textContent?.trim() || 'N/A',
          price: item.querySelector('.service-price')?.textContent?.trim() || 'Quote Required'
        }));
      });

      allServices.push(...newServices);
      currentPage++;
      await page.waitForTimeout(1000 + Math.random() * 2000); // Random delay
    }

    // Save extracted data to JSON file
    const fs = require('fs');
    const output = {
      metadata: {
        spider_version: '1.0.0',
        target_url: 'https://reporting.com/services',
        extraction_date: new Date().toISOString(),
        total_services: allServices.length
      },
      services: allServices,
      contact_info: contactInfo
    };

    fs.writeFileSync('tempest_reporting_services.json', JSON.stringify(output, null, 2));
    console.log(`Successfully extracted ${allServices.length} services and contact information.`);

  } catch (error) {
    console.error('Error during scraping:', error.message);
    // Save partial results if available
    if (allServices) {
      const partialOutput = {
        metadata: {
          spider_version: '1.0.0',
          target_url: 'https://reporting.com/services',
          extraction_date: new Date().toISOString(),
          status: 'partial_success',
          services_extracted: allServices.length
        },
        services: allServices,
        error: error.message
      };
      require('fs').writeFileSync('tempest_reporting_services_partial.json', JSON.stringify(partialOutput, null, 2));
    }
  } finally {
    await browser.close();
  }
})();
```

### Key Features of This Spider:
1. **Dynamic Content Handling**: Waits for network idle and uses realistic delays between actions
2. **Pagination Support**: Automatically handles multi-page service listings
3. **Error Resilience**: Includes try-catch blocks and saves partial results on failure
4. **Data Structure**: Organizes extracted data with metadata for traceability
5. **Realistic Mimicry**: Random delays and human-like interactions to avoid detection

### Usage Instructions:
1. Save the code to a file named `tempest_scraper.js`
2. Install dependencies: `npm install playwright`
3. Run the spider: `node tempest_scraper.js`
4. Results will be saved to `tempest_reporting_services.json`

Note: For production use, consider adding:
- Proxy rotation to avoid IP bans
- CAPTCHA solving services if detected
- More sophisticated rate limiting
- Database integration for large-scale scraping

Apply to these tools

Browse all tools

Rows

Spreadsheet with built-in API integrations and automation

Automa

Automate your browser workflows effortlessly

Iris Automation

Autonomous drone operations powered by AI

Alloy Automation

No-code SaaS integration and data sync

Forge Automation

Fast and reliable CNC machining services.

TeamCity

CI/CD automation with build configuration as code

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Automating data collection from e-commerce sites

Extracting competitor pricing data

Gathering market research data from various sources

Monitoring changes in web content

Best For

RevOpsGrowth

Quick Install

Terminal

claude install carlosplanchon/spidercreator

Alternative Install (Git Clone)

git clone https://github.com/carlosplanchon/spidercreator

Requirements

Claude Code or compatible AI agent
Works with: ChatGPT, GitHub Copilot

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Generate a Playwright-based web scraping spider for [WEBSITE_URL] using Spidercreator. The spider should extract [SPECIFIC_DATA_POINTS] (e.g., product names, prices, contact details) and handle [POTENTIAL_ISSUES] (e.g., pagination, dynamic content, login requirements). Include error handling for [COMMON_ERRORS] (e.g., rate limits, missing elements, CAPTCHAs). Output the code in a ready-to-use format with comments explaining key sections.

Example Output

```javascript
// Generated by Spidercreator for Tempest Reporting (https://reporting.com)
// Spider Version: 1.0.0
// Target: Service listings and contact details
// Last Updated: 2023-11-15

const { chromium } = require('playwright');

(async () => {
  const browser = await chromium.launch({ headless: false });
  const context = await browser.newContext();
  const page = await context.newPage();

  try {
    // Navigate to the target URL with a realistic delay to mimic human behavior
    await page.goto('https://reporting.com/services', { waitUntil: 'networkidle', timeout: 30000 });
    await page.waitForTimeout(2000); // Random delay between 1-3 seconds

    // Extract service categories and their descriptions
    const services = await page.$$eval('.service-item', (items) => {
      return items.map(item => ({
        title: item.querySelector('.service-title')?.textContent?.trim() || 'N/A',
        description: item.querySelector('.service-desc')?.textContent?.trim() || 'N/A',
        price: item.querySelector('.service-price')?.textContent?.trim() || 'Quote Required'
      }));
    });

    // Extract contact information from the footer
    const contactInfo = await page.$eval('.contact-section', (section) => ({
      phone: section.querySelector('.phone-number')?.textContent?.trim() || 'N/A',
      email: section.querySelector('.email-link')?.href?.replace('mailto:', '') || 'N/A',
      address: section.querySelector('.address')?.textContent?.trim() || 'N/A'
    }));

    // Handle pagination if present (check for next page button)
    let hasNextPage = true;
    let currentPage = 1;
    const allServices = [...services];

    while (hasNextPage && currentPage <= 5) { // Limit to 5 pages to avoid infinite loops
      console.log(`Processing page ${currentPage}...`);

      // Check for next page button
      const nextButton = await page.$('.next-page');
      if (!nextButton) {
        hasNextPage = false;
        break;
      }

      // Click next page and wait for content to load
      await Promise.all([
        page.click('.next-page'),
        page.waitForSelector('.service-item', { timeout: 10000 })
      ]);

      // Extract services from the new page
      const newServices = await page.$$eval('.service-item', (items) => {
        return items.map(item => ({
          title: item.querySelector('.service-title')?.textContent?.trim() || 'N/A',
          description: item.querySelector('.service-desc')?.textContent?.trim() || 'N/A',
          price: item.querySelector('.service-price')?.textContent?.trim() || 'Quote Required'
        }));
      });

      allServices.push(...newServices);
      currentPage++;
      await page.waitForTimeout(1000 + Math.random() * 2000); // Random delay
    }

    // Save extracted data to JSON file
    const fs = require('fs');
    const output = {
      metadata: {
        spider_version: '1.0.0',
        target_url: 'https://reporting.com/services',
        extraction_date: new Date().toISOString(),
        total_services: allServices.length
      },
      services: allServices,
      contact_info: contactInfo
    };

    fs.writeFileSync('tempest_reporting_services.json', JSON.stringify(output, null, 2));
    console.log(`Successfully extracted ${allServices.length} services and contact information.`);

  } catch (error) {
    console.error('Error during scraping:', error.message);
    // Save partial results if available
    if (allServices) {
      const partialOutput = {
        metadata: {
          spider_version: '1.0.0',
          target_url: 'https://reporting.com/services',
          extraction_date: new Date().toISOString(),
          status: 'partial_success',
          services_extracted: allServices.length
        },
        services: allServices,
        error: error.message
      };
      require('fs').writeFileSync('tempest_reporting_services_partial.json', JSON.stringify(partialOutput, null, 2));
    }
  } finally {
    await browser.close();
  }
})();
```

### Key Features of This Spider:
1. **Dynamic Content Handling**: Waits for network idle and uses realistic delays between actions
2. **Pagination Support**: Automatically handles multi-page service listings
3. **Error Resilience**: Includes try-catch blocks and saves partial results on failure
4. **Data Structure**: Organizes extracted data with metadata for traceability
5. **Realistic Mimicry**: Random delays and human-like interactions to avoid detection

### Usage Instructions:
1. Save the code to a file named `tempest_scraper.js`
2. Install dependencies: `npm install playwright`
3. Run the spider: `node tempest_scraper.js`
4. Results will be saved to `tempest_reporting_services.json`

Note: For production use, consider adding:
- Proxy rotation to avoid IP bans
- CAPTCHA solving services if detected
- More sophisticated rate limiting
- Database integration for large-scale scraping

spidercreator

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Rows

Automa

Iris Automation

Alloy Automation

Forge Automation

TeamCity

Compatible MCP servers

Configurable Puppeteer MCP Server

azure-mcp-server

Browser Automation MCP Server

MCP Doc Forge

mcp_forge

mcp graphql forge

Find the right skills for your stack

spidercreator

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Rows

Automa

Iris Automation

Alloy Automation

Forge Automation

TeamCity

Compatible MCP servers

Configurable Puppeteer MCP Server

azure-mcp-server

Browser Automation MCP Server

MCP Doc Forge

mcp_forge

mcp graphql forge

Find the right skills for your stack