CodeGen

🥇Gold

CodeGen is a comprehensive toolkit from Facebook AI Research for leveraging machine learning in code generation projects. It includes everything from dataset creation to model training and evaluation, along with pretrained models to accelerate development.

7721460Updated 3mo ago

Intermediate15 minutes to implementdevelopment

Saves ~240 min per use

Quick InstallView Source

claude install facebookresearch/CodeGen

Works with:

GitHub CopilotVS Code

Overview

About This Skill

CodeGen is an open-source toolkit from Facebook AI Research designed for machine learning-based code generation projects. It provides comprehensive infrastructure including dataset creation utilities, model training pipelines, and evaluation frameworks to streamline development workflows. The toolkit includes pretrained models that developers can leverage to accelerate their code generation implementations. CodeGen supports multiple programming languages and code representation formats, including intermediate representation (IR) for compiler-based code analysis. Researchers and engineers building code synthesis, transpilation, or code understanding systems can use CodeGen's modular components to reduce setup time and focus on model development.

How to Use

[{"step":"Select the appropriate CodeGen model for your task. For general-purpose code generation, use `codegen-350M-mono` or `codegen-2B-mono`. For Python-specific tasks, consider `codegen-350M-py` or `codegen-2B-py`.","tip":"Check the model's GitHub repository for performance benchmarks on your specific use case. Models with 'mono' in the name are multilingual, while those with 'py' are Python-focused."},{"step":"Prepare your input prompt with clear context. Include examples of desired outputs, relevant libraries, or code snippets that define the expected behavior.","tip":"Use the format: '### Instruction: [TASK]\\n### Context: [RELEVANT_CODE]\\n### Response:'. This helps the model understand the scope of the task."},{"step":"Load the model and tokenizer using Hugging Face's Transformers library. Specify the device (CPU/GPU) based on your hardware availability.","tip":"For production use, consider quantizing the model (e.g., 8-bit quantization) to reduce memory usage without significant performance loss."},{"step":"Generate code using the model's `generate()` method. Adjust parameters like `max_length`, `num_beams`, and `temperature` based on your requirements for creativity vs. determinism.","tip":"Start with conservative settings (e.g., `temperature=0.7`, `num_beams=5`) and increase creativity only if the output lacks detail."},{"step":"Validate and refine the generated code. Use static analysis tools (e.g., Pylint, mypy) or unit tests to ensure correctness before deployment.","tip":"For complex tasks, break the problem into smaller sub-tasks and generate code for each part iteratively. This improves reliability and makes debugging easier."}]

Use Cases

Automate the generation of boilerplate code for new software projects to save time.

Create and preprocess datasets for training custom machine learning models on code.

Evaluate the performance of machine learning models in various coding tasks, such as translation and deobfuscation.

Integrate AI-generated code snippets into existing applications to enhance functionality.

Best For

GrowthRevOps

Setup & Installation

Quick Install

Terminal

claude install facebookresearch/CodeGen

Alternative Install (Git Clone)

git clone https://github.com/facebookresearch/CodeGen

Requirements

Claude Code or compatible AI agent
Works with: GitHub Copilot, VS Code

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Generate a Python function that uses [MODEL_NAME] from CodeGen to implement [TASK_DESCRIPTION]. The function should handle [INPUT_TYPE] inputs and return [OUTPUT_TYPE]. Include error handling for [EDGE_CASES] and optimize for [PERFORMANCE_METRIC]. Use the following signature: ```python
def [FUNCTION_NAME]([PARAMETERS]) -> [RETURN_TYPE]:
    """
    [BRIEF_DESCRIPTION]
    """
    [IMPLEMENTATION]```

Example Output

```python
def generate_sql_query_from_nl(natural_language_query: str, database_schema: dict) -> str:
    """
    Converts a natural language query into a syntactically correct SQL query using CodeGen's pretrained model.
    Handles JOINs, WHERE clauses, and basic aggregations. Optimized for execution time (<500ms).
    
    Args:
        natural_language_query: User's query in English (e.g., "Show me all customers from New York who made purchases over $100")
        database_schema: Dictionary containing table names and column definitions (e.g., {"customers": ["id", "name", "city"], "orders": ["id", "customer_id", "amount"]})
    
    Returns:
        A valid SQL query string or raises ValueError if the query cannot be generated.
    
    Raises:
        ValueError: If the natural language query is ambiguous or contains unsupported operations.
    """
    import torch
    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    # Load pretrained CodeGen model and tokenizer
    model_name = "Salesforce/codegen-350M-mono"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    
    # Prepare input prompt with schema context
    schema_context = "\n".join([
        f"Table {table}: columns {', '.join(columns)}"
        for table, columns in database_schema.items()
    ])
    prompt = f"### Database Schema:\n{schema_context}\n### Natural Language Query: {natural_language_query}\n### SQL Query:"
    
    # Generate SQL query
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=256, num_beams=5)
    sql_query = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    
    # Validate SQL syntax (basic check)
    if not sql_query.upper().startswith("SELECT"):
        raise ValueError(f"Generated invalid SQL: {sql_query}")
    
    return sql_query

# Example usage
schema = {
    "customers": ["id", "name", "city", "state"],
    "orders": ["id", "customer_id", "amount", "order_date"]
}
query = generate_sql_query_from_nl(
    "List all customers from California who spent more than $500 in 2023",
    schema
)
print(query)
# Output: SELECT customers.name, customers.city FROM customers JOIN orders ON customers.id = orders.customer_id 
#         WHERE customers.state = 'CA' AND orders.amount > 500 AND YEAR(orders.order_date) = 2023
```

Apply to these tools

Browse all tools

Lever

Streamline talent acquisition with collaborative tools and customizable interview processes.

HashiCorp Nomad

Orchestrate workloads with multi-cloud support, job scheduling, and integrated service discovery features.

PlanetScale

Serverless MySQL database platform

Swagger

Design, document, and generate code for APIs with interactive tools for developers.

TeamCity

CI/CD automation with build configuration as code

Lightstep

Enhance performance monitoring and root cause analysis with real-time distributed tracing.

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Automate the generation of boilerplate code for new software projects to save time.

Create and preprocess datasets for training custom machine learning models on code.

Evaluate the performance of machine learning models in various coding tasks, such as translation and deobfuscation.

Integrate AI-generated code snippets into existing applications to enhance functionality.

Best For

GrowthRevOps

Quick Install

Terminal

claude install facebookresearch/CodeGen

Alternative Install (Git Clone)

git clone https://github.com/facebookresearch/CodeGen

Requirements

Claude Code or compatible AI agent
Works with: GitHub Copilot, VS Code

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Generate a Python function that uses [MODEL_NAME] from CodeGen to implement [TASK_DESCRIPTION]. The function should handle [INPUT_TYPE] inputs and return [OUTPUT_TYPE]. Include error handling for [EDGE_CASES] and optimize for [PERFORMANCE_METRIC]. Use the following signature: ```python
def [FUNCTION_NAME]([PARAMETERS]) -> [RETURN_TYPE]:
    """
    [BRIEF_DESCRIPTION]
    """
    [IMPLEMENTATION]```

Example Output

```python
def generate_sql_query_from_nl(natural_language_query: str, database_schema: dict) -> str:
    """
    Converts a natural language query into a syntactically correct SQL query using CodeGen's pretrained model.
    Handles JOINs, WHERE clauses, and basic aggregations. Optimized for execution time (<500ms).
    
    Args:
        natural_language_query: User's query in English (e.g., "Show me all customers from New York who made purchases over $100")
        database_schema: Dictionary containing table names and column definitions (e.g., {"customers": ["id", "name", "city"], "orders": ["id", "customer_id", "amount"]})
    
    Returns:
        A valid SQL query string or raises ValueError if the query cannot be generated.
    
    Raises:
        ValueError: If the natural language query is ambiguous or contains unsupported operations.
    """
    import torch
    from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
    
    # Load pretrained CodeGen model and tokenizer
    model_name = "Salesforce/codegen-350M-mono"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    
    # Prepare input prompt with schema context
    schema_context = "\n".join([
        f"Table {table}: columns {', '.join(columns)}"
        for table, columns in database_schema.items()
    ])
    prompt = f"### Database Schema:\n{schema_context}\n### Natural Language Query: {natural_language_query}\n### SQL Query:"
    
    # Generate SQL query
    inputs = tokenizer(prompt, return_tensors="pt").to(device)
    outputs = model.generate(**inputs, max_length=256, num_beams=5)
    sql_query = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    
    # Validate SQL syntax (basic check)
    if not sql_query.upper().startswith("SELECT"):
        raise ValueError(f"Generated invalid SQL: {sql_query}")
    
    return sql_query

# Example usage
schema = {
    "customers": ["id", "name", "city", "state"],
    "orders": ["id", "customer_id", "amount", "order_date"]
}
query = generate_sql_query_from_nl(
    "List all customers from California who spent more than $500 in 2023",
    schema
)
print(query)
# Output: SELECT customers.name, customers.city FROM customers JOIN orders ON customers.id = orders.customer_id 
#         WHERE customers.state = 'CA' AND orders.amount > 500 AND YEAR(orders.order_date) = 2023
```

CodeGen

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Lever

HashiCorp Nomad

PlanetScale

Swagger

TeamCity

Lightstep

Compatible MCP servers

s

s

s

ck

swagger mcp

mcpgen

Find the right skills for your stack

CodeGen

Overview

About This Skill

How to Use

Use Cases

Best For

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Lever

HashiCorp Nomad

PlanetScale

Swagger

TeamCity

Lightstep

Compatible MCP servers

s

s

s

ck

swagger mcp

mcpgen

Find the right skills for your stack