Data engineering skills for analytics and data engineers working with dbt and Snowflake. Automate data pipelines, transform data, and generate SQL code. Integrates with Claude Code for AI-assisted data workflows.
git clone https://github.com/AltimateAI/data-engineering-skills.gitData engineering skills for analytics and data engineers working with dbt and Snowflake. Automate data pipelines, transform data, and generate SQL code. Integrates with Claude Code for AI-assisted data workflows.
1. **Prepare Your Environment**: Ensure you have dbt installed and configured with Snowflake credentials. Run `dbt init [PROJECT_NAME]` if starting a new project. 2. **Create Source Files**: In your `models/` directory, create a `staging` subdirectory. Add a `schema.yml` file to define sources and tests. 3. **Generate Models**: Use the prompt template to create new dbt models. Replace [PLACEHOLDERS] with your specific requirements. For complex transformations, break them into multiple CTEs as shown in the example. 4. **Test Locally**: Run `dbt compile` to check for syntax errors, then `dbt run --select [MODEL_NAME]` to test the model against your Snowflake warehouse. 5. **Integrate with CI/CD**: Add the generated models to your dbt Cloud or GitHub Actions pipeline. Configure the `dbt test` command to run automatically after each deployment. **Pro Tips:** - Use `dbt docs generate` to create documentation for your models automatically - For large transformations, consider breaking them into multiple models with clear dependencies - Store frequently used transformations in macros to avoid repetition - Monitor Snowflake query performance with `dbt build --profile [PROFILE_NAME] --select [MODEL_NAME]`
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/AltimateAI/data-engineering-skillsCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Act as an expert data engineer using dbt and Snowflake. For the project [PROJECT_NAME], write a dbt model to transform [SOURCE_TABLE] into [TARGET_TABLE] with these requirements: [TRANSFORMATION_RULES]. Include error handling for null values in [COLUMN_LIST]. Generate the SQL code and explain the dbt model structure. Add a test for data quality that checks [TEST_CONDITION].
```sql
-- dbt model: stg_customers.sql
-- Transforms raw customer data into a clean staging table
{{
config(
materialized='table',
schema='staging',
tags=['customer']
)
}}
WITH source_data AS (
SELECT
customer_id,
first_name,
last_name,
email,
signup_date,
country,
CASE
WHEN email IS NULL THEN 'missing_email'
ELSE email
END AS cleaned_email,
CASE
WHEN country IN ('USA', 'United States') THEN 'US'
WHEN country IN ('UK', 'United Kingdom') THEN 'UK'
ELSE 'Other'
END AS region
FROM {{ source('raw', 'customers') }}
),
cleaned_data AS (
SELECT
customer_id,
TRIM(first_name) AS first_name,
TRIM(last_name) AS last_name,
cleaned_email,
signup_date::DATE AS signup_date,
region,
CURRENT_TIMESTAMP AS etl_loaded_at
FROM source_data
WHERE customer_id IS NOT NULL
)
SELECT * FROM cleaned_data;
-- Data quality test
{% test not_null('customer_id') %}
SELECT * FROM {{ this }}
WHERE customer_id IS NULL;
{% endtest %}
{% test accepted_values('region', ['US', 'UK', 'Other']) %}
SELECT * FROM {{ this }}
WHERE region NOT IN ('US', 'UK', 'Other');
{% endtest %}
```
**Model Structure Explanation:**
This dbt model follows a clean staging pattern by:
1. First creating a CTE (`source_data`) that handles raw data cleaning including null handling for emails and standardizing country codes
2. Then creating a `cleaned_data` CTE that applies additional transformations like trimming whitespace and proper date casting
3. Finally selecting all columns from the cleaned data CTE
**Key Features Implemented:**
- Null handling for critical fields (customer_id, email)
- Data type conversion (signup_date to DATE)
- Standardization of categorical data (country to region)
- Timestamp for tracking load times
- Two data quality tests: one for null values in customer_id and another for valid region values
**Usage Notes:**
This model should be run as part of your dbt project's staging layer. The tests will automatically run during `dbt test` and fail the build if any records violate the constraints. The `etl_loaded_at` field helps track when data was last processed.SQL-first transformation framework for analytics engineering
AI assistant built for thoughtful, nuanced conversation
Cloud ETL platform for non-technical data integration
Get more done every day with Microsoft Teams – powered by AI
Customer feedback management made simple
Design, document, and generate code for APIs with interactive tools for developers.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan