sitemap-harvester

🥈Silver

Extract URLs and page metadata from website sitemaps into CSV format. Harvests titles, descriptions, keywords, author info, canonical URLs, and Open Graph data automatically.

500Updated 3mo ago

Intermediate30min to implementmarketing

Saves ~120 min per use

Quick InstallView Source

git clone https://github.com/meysam81/sitemap-harvester.git

Works with:

Claude

Overview

About This Skill

Sitemap Harvester is a Python tool that crawls website sitemaps and extracts comprehensive page metadata into CSV format. It automatically harvests page titles, meta descriptions, keywords, author information, canonical URLs, and Open Graph social media data. The tool handles URL deduplication automatically and provides real-time progress updates during extraction. It's designed for marketers, SEO professionals, and web analysts who need to audit and analyze website structure and metadata at scale.

How to Use

Install via pip with `pip install sitemap-harvester`. Run `sitemap-harvester --url https://example.com` to harvest a website's sitemap into CSV. Use `--output` to specify a custom filename and `--timeout` to adjust timeout for slower websites or large sitemaps.

Use Cases

SEO audits: Extract metadata from entire website sitemaps for analysis

Content inventory: Create comprehensive CSV catalogs of all website pages and their metadata

Competitive analysis: Harvest competitor website sitemap data for comparison

Migration planning: Document all pages and metadata before website restructuring

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/meysam81/sitemap-harvester

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Crawl the sitemap of [WEBSITE_URL] and export the metadata of its pages recursively into a CSV file. Include the following metadata for each page: URL, title, meta description, h1, h2, and last modified date. Save the CSV file as [FILE_NAME].

Example Output

# Sitemap Metadata Harvesting Report

## Summary

The sitemap of `example.com` was successfully crawled and analyzed. A total of 1,245 pages were indexed, with the following metadata extracted:

- **Total Pages**: 1,245
- **Pages with Missing Titles**: 42
- **Pages with Missing Meta Descriptions**: 118
- **Average Word Count**: 842

## Key Findings

### Top 5 Most Linked Pages
1. `/about-us` - 142 internal links
2. `/contact` - 98 internal links
3. `/products` - 89 internal links
4. `/blog` - 76 internal links
5. `/services` - 65 internal links

### Pages with Missing Critical Metadata
- `/products/widget-a` - Missing meta description
- `/blog/post-123` - Missing H1 tag
- `/services/consulting` - Missing meta description

## CSV Export

The complete dataset has been exported to `example_com_sitemap_metadata.csv`. The file includes the following columns:

- `URL`
- `Title`
- `Meta Description`
- `H1`
- `H2`
- `Last Modified Date`

For further analysis, you can open the CSV file in Excel or any data analysis tool.

Apply to these tools

Browse all tools

Vest

Derivatives trading platform with analytics

item

AI-native CRM that works for you

Compatible MCP servers

Browse all MCP servers

mcphub

81ai-ml

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

SEO audits: Extract metadata from entire website sitemaps for analysis

Content inventory: Create comprehensive CSV catalogs of all website pages and their metadata

Competitive analysis: Harvest competitor website sitemap data for comparison

Migration planning: Document all pages and metadata before website restructuring

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/meysam81/sitemap-harvester

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Crawl the sitemap of [WEBSITE_URL] and export the metadata of its pages recursively into a CSV file. Include the following metadata for each page: URL, title, meta description, h1, h2, and last modified date. Save the CSV file as [FILE_NAME].

Example Output

# Sitemap Metadata Harvesting Report

## Summary

The sitemap of `example.com` was successfully crawled and analyzed. A total of 1,245 pages were indexed, with the following metadata extracted:

- **Total Pages**: 1,245
- **Pages with Missing Titles**: 42
- **Pages with Missing Meta Descriptions**: 118
- **Average Word Count**: 842

## Key Findings

### Top 5 Most Linked Pages
1. `/about-us` - 142 internal links
2. `/contact` - 98 internal links
3. `/products` - 89 internal links
4. `/blog` - 76 internal links
5. `/services` - 65 internal links

### Pages with Missing Critical Metadata
- `/products/widget-a` - Missing meta description
- `/blog/post-123` - Missing H1 tag
- `/services/consulting` - Missing meta description

## CSV Export

The complete dataset has been exported to `example_com_sitemap_metadata.csv`. The file includes the following columns:

- `URL`
- `Title`
- `Meta Description`
- `H1`
- `H2`
- `Last Modified Date`

For further analysis, you can open the CSV file in Excel or any data analysis tool.

sitemap-harvester

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Vest

item

Compatible MCP servers

mcphub

Find the right skills for your stack

sitemap-harvester

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Vest

item

Compatible MCP servers

mcphub

Find the right skills for your stack