AutoAI is a Python-based framework that automates regression and classification tasks on numerical data. It excels in model search, hyper-parameter tuning, and generates high-quality Jupyter Notebook code, making it a powerful tool for data scientists and ML practitioners.
claude install blobcity/autoaiAutoAI is a Python-based framework that automates regression and classification tasks on numerical data. It excels in model search, hyper-parameter tuning, and generates high-quality Jupyter Notebook code, making it a powerful tool for data scientists and ML practitioners.
1. **Prepare Your Data**: Ensure your dataset is in CSV format with a clear target variable (for regression: continuous; for classification: categorical). Clean missing values and outliers before proceeding. 2. **Install AutoAI**: Run `pip install autoai` in your Python environment. Verify installation with `autoai --version`. 3. **Run the Analysis**: Use the prompt template above, replacing `[DATASET_NAME]` with your dataset filename (e.g., `sales_data.csv`). For regression tasks, replace 'classification' with 'regression' in the code template. 4. **Review Outputs**: The generated notebook will include EDA, preprocessing steps, model search results, and evaluation metrics. Focus on the best-performing model's performance metrics and feature importance. 5. **Iterate if Needed**: If results aren't satisfactory, check data quality or adjust hyperparameter ranges in the notebook. Use Mode's SQL-based data exploration to validate data assumptions before rerunning AutoAI. *Tip*: For large datasets (>100K rows), pre-filter data in Mode to reduce processing time. Always validate the target variable distribution to avoid class imbalance issues in classification tasks.
Automating model selection for predictive analytics
Hyper-parameter tuning for improved model performance
Generating reproducible Jupyter Notebooks for data analysis
claude install blobcity/autoaigit clone https://github.com/blobcity/autoaiCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Use AutoAI to automate the regression task for [DATASET_NAME] (e.g., 'customer churn prediction dataset'). Run a full model search with hyperparameter tuning, then generate a high-quality Jupyter Notebook with the best-performing model. Include EDA, preprocessing steps, and model evaluation metrics. Output the notebook code in a code block for easy copying.
```python
# AutoAI Generated Jupyter Notebook for Customer Churn Prediction
# Dataset: customer_churn_2023.csv (10,000 samples, 20 features)
# Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, roc_auc_score, accuracy_score
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
import seaborn as sns
# Load Data
churn_data = pd.read_csv('customer_churn_2023.csv')
print(f"Dataset shape: {churn_data.shape}")
print(churn_data.head())
# Exploratory Data Analysis
print("\nMissing Values:")
print(churn_data.isnull().sum())
print("\nTarget Distribution:")
print(churn_data['Churn'].value_counts(normalize=True))
# Preprocessing
numeric_features = ['tenure', 'MonthlyCharges', 'TotalCharges']
categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents',
'PhoneService', 'InternetService', 'Contract']
preprocessor = ColumnTransformer(
transformers=[
('num', StandardScaler(), numeric_features),
('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
])
# Model Search with Hyperparameter Tuning
X = churn_data.drop('Churn', axis=1)
y = churn_data['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
models = {
'RandomForest': RandomForestClassifier(random_state=42),
'GradientBoosting': GradientBoostingClassifier(random_state=42),
'LogisticRegression': LogisticRegression(random_state=42, max_iter=1000)
}
param_grids = {
'RandomForest': {'classifier__n_estimators': [100, 200], 'classifier__max_depth': [10, 20]},
'GradientBoosting': {'classifier__n_estimators': [100, 150], 'classifier__learning_rate': [0.05, 0.1]},
'LogisticRegression': {'classifier__C': [0.1, 1, 10]}
}
best_model = None
best_score = 0
for name, model in models.items():
pipeline = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', model)])
grid_search = GridSearchCV(pipeline, param_grids[name], cv=5, scoring='roc_auc', n_jobs=-1)
grid_search.fit(X_train, y_train)
print(f"\n{name} Results:")
print(f"Best CV Score: {grid_search.best_score_:.4f}")
print(f"Best Params: {grid_search.best_params_}")
if grid_search.best_score_ > best_score:
best_score = grid_search.best_score_
best_model = grid_search.best_estimator_
# Evaluate Best Model
best_model.fit(X_train, y_train)
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]
print("\nBest Model Evaluation:")
print(f"ROC AUC: {roc_auc_score(y_test, y_pred_proba):.4f}")
print(f"Accuracy: {accuracy_score(y_test, y_pred):.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Feature Importance
if hasattr(best_model.named_steps['classifier'], 'feature_importances_'):
importances = best_model.named_steps['classifier'].feature_importances_
features = (numeric_features +
list(best_model.named_steps['preprocessor']
.named_transformers_['cat']
.get_feature_names_out(categorical_features)))
feature_importance = pd.DataFrame({'feature': features, 'importance': importances})
feature_importance = feature_importance.sort_values('importance', ascending=False)
plt.figure(figsize=(10, 6))
sns.barplot(x='importance', y='feature', data=feature_importance.head(10))
plt.title('Top 10 Feature Importances')
plt.tight_layout()
plt.show()
print("\nBest Model Pipeline:")
print(best_model)
```Your one-stop shop for church and ministry supplies.
Automate your browser workflows effortlessly
Unlock data insights with interactive dashboards and collaborative analytics capabilities.
Orchestrate workloads with multi-cloud support, job scheduling, and integrated service discovery features.
CI/CD automation with build configuration as code
Enhance performance monitoring and root cause analysis with real-time distributed tracing.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan