Generate and manage datasets on HuggingFace Hub. Create chat, classification, Q&A, completion, tabular, and custom formats. Integrates with Claude Code for dataset discovery and management.
git clone https://github.com/burtenshaw/dataset-creator-skill.gitThe HuggingFace Dataset Creator Skill enables you to build and manage datasets on HuggingFace Hub without leaving Claude Code. It supports six dataset types: chat for conversational AI, classification for sentiment and intent tasks, Q&A for reading comprehension, completion for text generation, tabular for structured ML data, and custom schemas for specialized needs. Each dataset type includes automatic validation, schema enforcement, and example data to ensure consistency and quality. The skill integrates with HuggingFace MCP Server for seamless dataset discovery and management, making it ideal for machine learning practitioners, data engineers, and AI teams who need to organize training data at scale.
Install the HuggingFace MCP Server, set your HF_TOKEN environment variable, and add the huggingface_hub dependency. Then use natural language prompts in Claude Code to create datasets—for example, 'Create a sentiment classification dataset at myusername/sentiment-data' or 'Add 50 examples to an existing Q&A dataset.'
Create sentiment classification datasets for fine-tuning language models
Build Q&A datasets for training retrieval-augmented generation systems
Generate conversational chat datasets for chatbot and assistant training
Organize structured tabular data for machine learning workflows
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/burtenshaw/dataset-creator-skillCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Create a new dataset on HuggingFace Hub for [DATASET_TYPE] tasks. The dataset should include [NUMBER] examples with the following structure: [STRUCTURE_DESCRIPTION]. Include metadata fields for [METADATA_FIELDS]. Generate the dataset and provide the HuggingFace Hub link.
I've created a new dataset on HuggingFace Hub for chat tasks. The dataset includes 100 examples with the following structure: a 'prompt' field containing the user's input, a 'response' field with the AI's output, and a 'context' field providing additional background information. Metadata fields include 'topic', 'difficulty_level', and 'source'. You can access the dataset at: https://huggingface.co/datasets/yourusername/chat_dataset_v1. The dataset is ready for use in fine-tuning or evaluation tasks.
AI assistant built for thoughtful, nuanced conversation
IronCalc is a spreadsheet engine and ecosystem
ITIL-aligned IT service management platform
Customer feedback management made simple
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan