autoai

🥇Gold

AutoAI is a Python-based framework that automates regression and classification tasks on numerical data. It excels in model search, hyper-parameter tuning, and generates high-quality Jupyter Notebook code, making it a powerful tool for data scientists and ML practitioners.

186460Updated 3mo ago

Intermediate15 minutes to implementdevelopment

Saves ~180 min per use

Quick InstallView Source

claude install blobcity/autoai

Works with:

GitHub CopilotVS Code

Overview

About This Skill

AutoAI automates the end-to-end machine learning workflow for regression and classification problems on numerical data. The framework performs automatic model search across multiple algorithms, executes hyperparameter tuning, and generates high-quality, documented Python code or Jupyter Notebooks ready for deployment. Built-in preprocessing handles missing values, encoding, and scaling automatically, while feature selection removes low-importance columns to improve model performance. Data scientists can train models with a single function call, view performance metrics through accuracy plots and confusion matrices, and export trained models with exhaustive documentation for immediate use.

How to Use

1. **Prepare Your Data**: Ensure your dataset is in CSV format with a clear target variable (for regression: continuous; for classification: categorical). Clean missing values and outliers before proceeding. 2. **Install AutoAI**: Run `pip install autoai` in your Python environment. Verify installation with `autoai --version`. 3. **Run the Analysis**: Use the prompt template above, replacing `[DATASET_NAME]` with your dataset filename (e.g., `sales_data.csv`). For regression tasks, replace 'classification' with 'regression' in the code template. 4. **Review Outputs**: The generated notebook will include EDA, preprocessing steps, model search results, and evaluation metrics. Focus on the best-performing model's performance metrics and feature importance. 5. **Iterate if Needed**: If results aren't satisfactory, check data quality or adjust hyperparameter ranges in the notebook. Use Mode's SQL-based data exploration to validate data assumptions before rerunning AutoAI. *Tip*: For large datasets (>100K rows), pre-filter data in Mode to reduce processing time. Always validate the target variable distribution to avoid class imbalance issues in classification tasks.

Use Cases

Automating model selection for predictive analytics

Hyper-parameter tuning for improved model performance

Generating reproducible Jupyter Notebooks for data analysis

Best For

GrowthRevOps

Setup & Installation

Quick Install

Terminal

claude install blobcity/autoai

Alternative Install (Git Clone)

git clone https://github.com/blobcity/autoai

Requirements

Claude Code or compatible AI agent
Works with: GitHub Copilot, VS Code

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use AutoAI to automate the regression task for [DATASET_NAME] (e.g., 'customer churn prediction dataset'). Run a full model search with hyperparameter tuning, then generate a high-quality Jupyter Notebook with the best-performing model. Include EDA, preprocessing steps, and model evaluation metrics. Output the notebook code in a code block for easy copying.

Example Output

```python
# AutoAI Generated Jupyter Notebook for Customer Churn Prediction
# Dataset: customer_churn_2023.csv (10,000 samples, 20 features)

# Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, accuracy_score
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns

# Load Data
churn_data = pd.read_csv('customer_churn_2023.csv')
print(f"Dataset shape: {churn_data.shape}")
print(churn_data.head())

# Exploratory Data Analysis
print("\nMissing Values:")
print(churn_data.isnull().sum())

print("\nTarget Distribution:")
print(churn_data['Churn'].value_counts(normalize=True))

# Preprocessing
numeric_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 
                       'PhoneService', 'InternetService', 'Contract']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])

# Model Search with Hyperparameter Tuning
X = churn_data.drop('Churn', axis=1)
y = churn_data['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

models = {
    'RandomForest': RandomForestClassifier(random_state=42),
    'GradientBoosting': GradientBoostingClassifier(random_state=42),
    'LogisticRegression': LogisticRegression(random_state=42, max_iter=1000)
}

param_grids = {
    'RandomForest': {'classifier__n_estimators': [100, 200], 'classifier__max_depth': [10, 20]},
    'GradientBoosting': {'classifier__n_estimators': [100, 150], 'classifier__learning_rate': [0.05, 0.1]},
    'LogisticRegression': {'classifier__C': [0.1, 1, 10]}
}

best_model = None
best_score = 0

for name, model in models.items():
    pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('classifier', model)])
    
    grid_search = GridSearchCV(pipeline, param_grids[name], cv=5, scoring='roc_auc', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    print(f"\n{name} Results:")
    print(f"Best CV Score: {grid_search.best_score_:.4f}")
    print(f"Best Params: {grid_search.best_params_}")
    
    if grid_search.best_score_ > best_score:
        best_score = grid_search.best_score_
        best_model = grid_search.best_estimator_

# Evaluate Best Model
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]

print("\nBest Model Evaluation:")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba):.4f}")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Feature Importance
if hasattr(best_model.named_steps['classifier'], 'feature_importances_'):
    importances = best_model.named_steps['classifier'].feature_importances_
    features = (numeric_features + 
                list(best_model.named_steps['preprocessor']
                    .named_transformers_['cat']
                    .get_feature_names_out(categorical_features)))
    
    feature_importance = pd.DataFrame({'feature': features, 'importance': importances})
    feature_importance = feature_importance.sort_values('importance', ascending=False)
    
    plt.figure(figsize=(10, 6))
    sns.barplot(x='importance', y='feature', data=feature_importance.head(10))
    plt.title('Top 10 Feature Importances')
    plt.tight_layout()
    plt.show()

print("\nBest Model Pipeline:")
print(best_model)
```

Apply to these tools

Browse all tools

Automa

Automate your browser workflows effortlessly

HashiCorp Nomad

Orchestrate workloads with multi-cloud support, job scheduling, and integrated service discovery features.

PlanetScale

Serverless MySQL database platform

Swagger

Design, document, and generate code for APIs with interactive tools for developers.

TeamCity

CI/CD automation with build configuration as code

Lightstep

Enhance performance monitoring and root cause analysis with real-time distributed tracing.

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Automating model selection for predictive analytics

Hyper-parameter tuning for improved model performance

Generating reproducible Jupyter Notebooks for data analysis

Best For

GrowthRevOps

Quick Install

Terminal

claude install blobcity/autoai

Alternative Install (Git Clone)

git clone https://github.com/blobcity/autoai

Requirements

Claude Code or compatible AI agent
Works with: GitHub Copilot, VS Code

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Use AutoAI to automate the regression task for [DATASET_NAME] (e.g., 'customer churn prediction dataset'). Run a full model search with hyperparameter tuning, then generate a high-quality Jupyter Notebook with the best-performing model. Include EDA, preprocessing steps, and model evaluation metrics. Output the notebook code in a code block for easy copying.

Example Output

```python
# AutoAI Generated Jupyter Notebook for Customer Churn Prediction
# Dataset: customer_churn_2023.csv (10,000 samples, 20 features)

# Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, accuracy_score
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns

# Load Data
churn_data = pd.read_csv('customer_churn_2023.csv')
print(f"Dataset shape: {churn_data.shape}")
print(churn_data.head())

# Exploratory Data Analysis
print("\nMissing Values:")
print(churn_data.isnull().sum())

print("\nTarget Distribution:")
print(churn_data['Churn'].value_counts(normalize=True))

# Preprocessing
numeric_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 
                       'PhoneService', 'InternetService', 'Contract']

preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numeric_features),
        ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])

# Model Search with Hyperparameter Tuning
X = churn_data.drop('Churn', axis=1)
y = churn_data['Churn']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

models = {
    'RandomForest': RandomForestClassifier(random_state=42),
    'GradientBoosting': GradientBoostingClassifier(random_state=42),
    'LogisticRegression': LogisticRegression(random_state=42, max_iter=1000)
}

param_grids = {
    'RandomForest': {'classifier__n_estimators': [100, 200], 'classifier__max_depth': [10, 20]},
    'GradientBoosting': {'classifier__n_estimators': [100, 150], 'classifier__learning_rate': [0.05, 0.1]},
    'LogisticRegression': {'classifier__C': [0.1, 1, 10]}
}

best_model = None
best_score = 0

for name, model in models.items():
    pipeline = Pipeline(steps=[('preprocessor', preprocessor),
                              ('classifier', model)])
    
    grid_search = GridSearchCV(pipeline, param_grids[name], cv=5, scoring='roc_auc', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    print(f"\n{name} Results:")
    print(f"Best CV Score: {grid_search.best_score_:.4f}")
    print(f"Best Params: {grid_search.best_params_}")
    
    if grid_search.best_score_ > best_score:
        best_score = grid_search.best_score_
        best_model = grid_search.best_estimator_

# Evaluate Best Model
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]

print("\nBest Model Evaluation:")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba):.4f}")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Feature Importance
if hasattr(best_model.named_steps['classifier'], 'feature_importances_'):
    importances = best_model.named_steps['classifier'].feature_importances_
    features = (numeric_features + 
                list(best_model.named_steps['preprocessor']
                    .named_transformers_['cat']
                    .get_feature_names_out(categorical_features)))
    
    feature_importance = pd.DataFrame({'feature': features, 'importance': importances})
    feature_importance = feature_importance.sort_values('importance', ascending=False)
    
    plt.figure(figsize=(10, 6))
    sns.barplot(x='importance', y='feature', data=feature_importance.head(10))
    plt.title('Top 10 Feature Importances')
    plt.tight_layout()
    plt.show()

print("\nBest Model Pipeline:")
print(best_model)
```

autoai

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Automa

HashiCorp Nomad

PlanetScale

Swagger

TeamCity

Lightstep

Compatible MCP servers

s

s

s

ck

swagger mcp

mcpgen

Find the right skills for your stack

autoai

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Automa

HashiCorp Nomad

PlanetScale

Swagger

TeamCity

Lightstep

Compatible MCP servers

s

s

s

ck

swagger mcp

mcpgen

Find the right skills for your stack