GitTaskBench is a benchmark for evaluating code agents on real-world tasks. It measures performance from repo understanding to task delivery, with a cost-aware metric. Benefits operations teams by improving agent efficiency in development and bug-fixing workflows. Integrates with Python-based code agents like Claude.
git clone https://github.com/QuantaAlpha/GitTaskBench.gitGitTaskBench is a benchmark for evaluating code agents on real-world tasks. It measures performance from repo understanding to task delivery, with a cost-aware metric. Benefits operations teams by improving agent efficiency in development and bug-fixing workflows. Integrates with Python-based code agents like Claude.
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/QuantaAlpha/GitTaskBenchCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Benchmark the performance of a Code Agent on the [REPOSITORY_NAME] repository. The agent should perform the following tasks: 1) Understand the repository structure and documentation, 2) Set up the development environment, 3) Identify and fix a bug or implement a new feature, and 4) Deliver the task with a cost-aware α metric. Provide a detailed report on the agent's performance.
# Repository Benchmark Report ## Repository: ExampleWebApp ### Task 1: Repository Understanding - **Documentation Review**: The agent successfully reviewed the README.md and identified the key technologies used: React, Node.js, and MongoDB. - **Structure Analysis**: The agent mapped out the repository structure, noting the presence of a frontend, backend, and database layer. ### Task 2: Environment Setup - **Dependency Installation**: The agent installed all necessary dependencies using npm and yarn. - **Database Configuration**: The agent set up the MongoDB database and configured the environment variables. ### Task 3: Bug Fixing - **Bug Identification**: The agent identified a bug in the user authentication flow. - **Bug Fix**: The agent implemented a fix by updating the authentication middleware. ### Task 4: Task Delivery - **Cost-Aware α Metric**: The agent delivered the task with an α metric of 0.85, indicating efficient use of resources. - **Final Report**: The agent provided a detailed report on the tasks performed and the outcomes achieved.
Simple data integration for modern teams
IronCalc is a spreadsheet engine and ecosystem
Business communication and collaboration hub
Customer feedback management made simple
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power