Crawl sitemap of a given website and export metadata of its pages recursively into CSV format.
git clone https://github.com/meysam81/sitemap-harvester.gitCrawl sitemap of a given website and export metadata of its pages recursively into CSV format.
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/meysam81/sitemap-harvesterCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Crawl the sitemap of [WEBSITE_URL] and export the metadata of its pages recursively into a CSV file. Include the following metadata for each page: URL, title, meta description, h1, h2, and last modified date. Save the CSV file as [FILE_NAME].
# Sitemap Metadata Harvesting Report ## Summary The sitemap of `example.com` was successfully crawled and analyzed. A total of 1,245 pages were indexed, with the following metadata extracted: - **Total Pages**: 1,245 - **Pages with Missing Titles**: 42 - **Pages with Missing Meta Descriptions**: 118 - **Average Word Count**: 842 ## Key Findings ### Top 5 Most Linked Pages 1. `/about-us` - 142 internal links 2. `/contact` - 98 internal links 3. `/products` - 89 internal links 4. `/blog` - 76 internal links 5. `/services` - 65 internal links ### Pages with Missing Critical Metadata - `/products/widget-a` - Missing meta description - `/blog/post-123` - Missing H1 tag - `/services/consulting` - Missing meta description ## CSV Export The complete dataset has been exported to `example_com_sitemap_metadata.csv`. The file includes the following columns: - `URL` - `Title` - `Meta Description` - `H1` - `H2` - `Last Modified Date` For further analysis, you can open the CSV file in Excel or any data analysis tool.