CodeGen is a comprehensive toolkit from Facebook AI Research for leveraging machine learning in code generation projects. It includes everything from dataset creation to model training and evaluation, along with pretrained models to accelerate development.
claude install facebookresearch/CodeGenhttps://github.com/facebookresearch/CodeGen
[{"step":"Select the appropriate CodeGen model for your task. For general-purpose code generation, use `codegen-350M-mono` or `codegen-2B-mono`. For Python-specific tasks, consider `codegen-350M-py` or `codegen-2B-py`.","tip":"Check the model's GitHub repository for performance benchmarks on your specific use case. Models with 'mono' in the name are multilingual, while those with 'py' are Python-focused."},{"step":"Prepare your input prompt with clear context. Include examples of desired outputs, relevant libraries, or code snippets that define the expected behavior.","tip":"Use the format: '### Instruction: [TASK]\\n### Context: [RELEVANT_CODE]\\n### Response:'. This helps the model understand the scope of the task."},{"step":"Load the model and tokenizer using Hugging Face's Transformers library. Specify the device (CPU/GPU) based on your hardware availability.","tip":"For production use, consider quantizing the model (e.g., 8-bit quantization) to reduce memory usage without significant performance loss."},{"step":"Generate code using the model's `generate()` method. Adjust parameters like `max_length`, `num_beams`, and `temperature` based on your requirements for creativity vs. determinism.","tip":"Start with conservative settings (e.g., `temperature=0.7`, `num_beams=5`) and increase creativity only if the output lacks detail."},{"step":"Validate and refine the generated code. Use static analysis tools (e.g., Pylint, mypy) or unit tests to ensure correctness before deployment.","tip":"For complex tasks, break the problem into smaller sub-tasks and generate code for each part iteratively. This improves reliability and makes debugging easier."}]
Automate the generation of boilerplate code for new software projects to save time.
Create and preprocess datasets for training custom machine learning models on code.
Evaluate the performance of machine learning models in various coding tasks, such as translation and deobfuscation.
Integrate AI-generated code snippets into existing applications to enhance functionality.
claude install facebookresearch/CodeGengit clone https://github.com/facebookresearch/CodeGenCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Generate a Python function that uses [MODEL_NAME] from CodeGen to implement [TASK_DESCRIPTION]. The function should handle [INPUT_TYPE] inputs and return [OUTPUT_TYPE]. Include error handling for [EDGE_CASES] and optimize for [PERFORMANCE_METRIC]. Use the following signature: ```python
def [FUNCTION_NAME]([PARAMETERS]) -> [RETURN_TYPE]:
"""
[BRIEF_DESCRIPTION]
"""
[IMPLEMENTATION]``````python
def generate_sql_query_from_nl(natural_language_query: str, database_schema: dict) -> str:
"""
Converts a natural language query into a syntactically correct SQL query using CodeGen's pretrained model.
Handles JOINs, WHERE clauses, and basic aggregations. Optimized for execution time (<500ms).
Args:
natural_language_query: User's query in English (e.g., "Show me all customers from New York who made purchases over $100")
database_schema: Dictionary containing table names and column definitions (e.g., {"customers": ["id", "name", "city"], "orders": ["id", "customer_id", "amount"]})
Returns:
A valid SQL query string or raises ValueError if the query cannot be generated.
Raises:
ValueError: If the natural language query is ambiguous or contains unsupported operations.
"""
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
# Load pretrained CodeGen model and tokenizer
model_name = "Salesforce/codegen-350M-mono"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Prepare input prompt with schema context
schema_context = "\n".join([
f"Table {table}: columns {', '.join(columns)}"
for table, columns in database_schema.items()
])
prompt = f"### Database Schema:\n{schema_context}\n### Natural Language Query: {natural_language_query}\n### SQL Query:"
# Generate SQL query
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=256, num_beams=5)
sql_query = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
# Validate SQL syntax (basic check)
if not sql_query.upper().startswith("SELECT"):
raise ValueError(f"Generated invalid SQL: {sql_query}")
return sql_query
# Example usage
schema = {
"customers": ["id", "name", "city", "state"],
"orders": ["id", "customer_id", "amount", "order_date"]
}
query = generate_sql_query_from_nl(
"List all customers from California who spent more than $500 in 2023",
schema
)
print(query)
# Output: SELECT customers.name, customers.city FROM customers JOIN orders ON customers.id = orders.customer_id
# WHERE customers.state = 'CA' AND orders.amount > 500 AND YEAR(orders.order_date) = 2023
```Streamline talent acquisition with collaborative tools and customizable interview processes.
Unlock data insights with interactive dashboards and collaborative analytics capabilities.
Orchestrate workloads with multi-cloud support, job scheduling, and integrated service discovery features.
Design, document, and generate code for APIs with interactive tools for developers.
CI/CD automation with build configuration as code
Enhance performance monitoring and root cause analysis with real-time distributed tracing.
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan