mcpbr

🥈Silver

Evaluate MCP servers using Model Context Protocol Benchmark Runner. Operations teams benchmark server performance against real GitHub issues. Connects to MCP servers and GitHub, providing quantifiable metrics for optimization.

1090Updated 3mo ago

Intermediate30min to implementautomation

Saves ~120 min per use

Quick InstallView Source

git clone https://github.com/greynewell/mcpbr.git

Works with:

Claude

Overview

About This Skill

mcpbr (Model Context Protocol Benchmark Runner) evaluates whether your MCP server actually improves agent performance on software engineering tasks. It runs controlled experiments comparing tool-assisted agents against baseline performance using real GitHub issues from SWE-bench, eliminating guesswork with reproducible, quantifiable results. The tool supports 30+ benchmarks across software engineering, code generation, math, and knowledge domains, with Docker-isolated task environments and pinned dependencies ensuring reliable comparisons. Operations teams and MCP developers use mcpbr to validate whether tools help or hurt agent resolution rates, token usage, and cost per task before production deployment.

How to Use

1. **Prepare Your Environment:** Ensure you have MCPBR installed (`pip install mcpbr`) and authenticated with both GitHub and your MCP server. Verify the server is running and accessible via the MCP Inspector or CLI. 2. **Select a Benchmark Issue:** Choose a representative GitHub issue from your repository that exercises the server's capabilities (e.g., a complex issue with multiple comments, code snippets, and labels). Copy the issue URL for the [GITHUB_REPO_ISSUE_URL] placeholder. 3. **Run the Benchmark:** Execute the MCPBR command with your server details: ```bash mcpbr benchmark --server github-repo-analyzer --issue https://github.com/owner/repo/issues/42 --baseline local-github-bridge ``` For advanced configurations, use the `--config` flag to specify custom parameters like context window size or timeout values. 4. **Analyze the Report:** Review the generated report (JSON and visualizations) to identify performance bottlenecks. Focus on metrics like latency, error rates, and context utilization that deviate from your targets. 5. **Iterate and Optimize:** Apply the recommendations from the report to your server configuration. Re-run the benchmark after each optimization to validate improvements. Use the `--compare` flag to generate delta reports between runs: ```bash mcpbr benchmark --server github-repo-analyzer --issue https://github.com/owner/repo/issues/42 --compare /tmp/mcpbr/github-repo-analyzer-20240501-100000.json ``` **Pro Tips:** - Run benchmarks during off-peak hours to minimize external API rate limiting (e.g., GitHub API throttling). - Use the `--warmup` flag to pre-load the server with common queries before benchmarking for more consistent results. - For teams managing multiple MCP servers, maintain a shared configuration file (e.g., `mcpbr-config.json`) to standardize benchmarking parameters across servers.

Use Cases

Validate whether a code analysis MCP server improves agent performance on real GitHub issues

Compare baseline agent behavior against tool-augmented agent behavior across 500+ SWE-bench tasks

Measure efficiency tradeoffs: resolution rate vs. token usage vs. cost per task

Identify which repositories benefit from specific MCP tools before deployment

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/greynewell/mcpbr

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Run the MCPBR (Model Context Protocol Benchmark Runner) against the [MCP_SERVER_NAME] server using the [GITHUB_REPO_ISSUE_URL] issue as a benchmark. Focus on the following metrics: response latency, context window utilization, and error rate. Generate a report comparing the results to the baseline performance of [BASELINE_SERVER_NAME]. Include specific recommendations for optimizing the server configuration based on the findings.

Example Output

### MCPBR Benchmark Report: MCP Server Performance Evaluation

**Benchmark Run ID:** 20240515-143022
**MCP Server Evaluated:** `github-repo-analyzer` (v1.2.3)
**Benchmark Issue:** https://github.com/owner/repo/issues/42
**Baseline Server:** `local-github-bridge` (v1.0.1)
**Evaluation Period:** 2024-05-15 14:30:00 - 14:45:00 UTC

#### Performance Metrics Comparison
| Metric                     | github-repo-analyzer | local-github-bridge | Delta  | Target  |
|----------------------------|----------------------|---------------------|--------|---------|
| Avg. Response Latency      | 187ms                | 245ms               | -24%   | <200ms  |
| Context Window Utilization | 89%                  | 67%                 | +33%   | >85%    |
| Error Rate                 | 0.8%                 | 2.1%                | -62%   | <1%     |
| Token Throughput           | 1,240 tokens/sec     | 980 tokens/sec      | +27%   | >1,000  |

#### Key Findings
1. **Response Latency:** The `github-repo-analyzer` server demonstrated a 24% faster response time compared to the baseline, primarily due to optimized caching of frequently accessed repository metadata. The average latency of 187ms meets the target of <200ms, with 95% of requests resolving in under 250ms.

2. **Context Window Utilization:** The server achieved 89% context window utilization, significantly outperforming the baseline (67%). This indicates more efficient handling of large GitHub issues with minimal truncation. The high utilization suggests the server is leveraging the full 128K token context window effectively.

3. **Error Rate:** The error rate of 0.8% is a 62% improvement over the baseline (2.1%). The most common errors were related to rate-limiting during peak GitHub API calls, which were mitigated by implementing exponential backoff in the latest version.

#### Optimization Recommendations
- **Caching Strategy:** Implement a Redis-based cache for repository metadata to further reduce latency. Current caching is in-memory only and resets on server restart.
- **Rate Limiting:** Add a configurable rate limiter to prevent GitHub API throttling during high-traffic periods. The current implementation lacks this safeguard.
- **Context Truncation:** While context utilization is high, consider adding a dynamic truncation policy for extremely large issues (>50K tokens) to prevent memory bloat.
- **Monitoring:** Deploy Prometheus metrics exporter to track these metrics in real-time. The current benchmarking is manual and lacks historical trend analysis.

#### Next Steps
1. Deploy the recommended caching improvements and re-run the benchmark to validate the 10-15% latency reduction.
2. Monitor the error rate over the next 7 days to ensure the rate-limiting changes resolve the remaining issues.
3. Schedule a quarterly benchmarking review to track performance against evolving GitHub issue structures and MCP server updates.

**Benchmark Artifacts:**
- Raw metrics: `/tmp/mcpbr/github-repo-analyzer-20240515-143022.json`
- Performance graphs: `/tmp/mcpbr/github-repo-analyzer-20240515-143022.png`
- Configuration diff: `/tmp/mcpbr/config-changes.patch`

Apply to these tools

Browse all tools

IronCalc

IronCalc is a spreadsheet engine and ecosystem

Microsoft Teams

Get more done every day with Microsoft Teams – powered by AI

ServiceNow

Enterprise workflow automation and service management platform

GPT for work

Automate your spreadsheet tasks with AI power

Respell

Agentic AI Workflow platform

Notion

Connected workspace for docs, wikis, and projects

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Use Cases

Validate whether a code analysis MCP server improves agent performance on real GitHub issues

Compare baseline agent behavior against tool-augmented agent behavior across 500+ SWE-bench tasks

Measure efficiency tradeoffs: resolution rate vs. token usage vs. cost per task

Identify which repositories benefit from specific MCP tools before deployment

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/greynewell/mcpbr

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Run the MCPBR (Model Context Protocol Benchmark Runner) against the [MCP_SERVER_NAME] server using the [GITHUB_REPO_ISSUE_URL] issue as a benchmark. Focus on the following metrics: response latency, context window utilization, and error rate. Generate a report comparing the results to the baseline performance of [BASELINE_SERVER_NAME]. Include specific recommendations for optimizing the server configuration based on the findings.

Example Output

### MCPBR Benchmark Report: MCP Server Performance Evaluation

**Benchmark Run ID:** 20240515-143022
**MCP Server Evaluated:** `github-repo-analyzer` (v1.2.3)
**Benchmark Issue:** https://github.com/owner/repo/issues/42
**Baseline Server:** `local-github-bridge` (v1.0.1)
**Evaluation Period:** 2024-05-15 14:30:00 - 14:45:00 UTC

#### Performance Metrics Comparison
| Metric                     | github-repo-analyzer | local-github-bridge | Delta  | Target  |
|----------------------------|----------------------|---------------------|--------|---------|
| Avg. Response Latency      | 187ms                | 245ms               | -24%   | <200ms  |
| Context Window Utilization | 89%                  | 67%                 | +33%   | >85%    |
| Error Rate                 | 0.8%                 | 2.1%                | -62%   | <1%     |
| Token Throughput           | 1,240 tokens/sec     | 980 tokens/sec      | +27%   | >1,000  |

#### Key Findings
1. **Response Latency:** The `github-repo-analyzer` server demonstrated a 24% faster response time compared to the baseline, primarily due to optimized caching of frequently accessed repository metadata. The average latency of 187ms meets the target of <200ms, with 95% of requests resolving in under 250ms.

2. **Context Window Utilization:** The server achieved 89% context window utilization, significantly outperforming the baseline (67%). This indicates more efficient handling of large GitHub issues with minimal truncation. The high utilization suggests the server is leveraging the full 128K token context window effectively.

3. **Error Rate:** The error rate of 0.8% is a 62% improvement over the baseline (2.1%). The most common errors were related to rate-limiting during peak GitHub API calls, which were mitigated by implementing exponential backoff in the latest version.

#### Optimization Recommendations
- **Caching Strategy:** Implement a Redis-based cache for repository metadata to further reduce latency. Current caching is in-memory only and resets on server restart.
- **Rate Limiting:** Add a configurable rate limiter to prevent GitHub API throttling during high-traffic periods. The current implementation lacks this safeguard.
- **Context Truncation:** While context utilization is high, consider adding a dynamic truncation policy for extremely large issues (>50K tokens) to prevent memory bloat.
- **Monitoring:** Deploy Prometheus metrics exporter to track these metrics in real-time. The current benchmarking is manual and lacks historical trend analysis.

#### Next Steps
1. Deploy the recommended caching improvements and re-run the benchmark to validate the 10-15% latency reduction.
2. Monitor the error rate over the next 7 days to ensure the rate-limiting changes resolve the remaining issues.
3. Schedule a quarterly benchmarking review to track performance against evolving GitHub issue structures and MCP server updates.

**Benchmark Artifacts:**
- Raw metrics: `/tmp/mcpbr/github-repo-analyzer-20240515-143022.json`
- Performance graphs: `/tmp/mcpbr/github-repo-analyzer-20240515-143022.png`
- Configuration diff: `/tmp/mcpbr/config-changes.patch`

mcpbr

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

IronCalc

Microsoft Teams

ServiceNow

GPT for work

Respell

Notion

Compatible MCP servers

s

s

s

context sync

mcp notion server

src to kb

Find the right skills for your stack

mcpbr

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

IronCalc

Microsoft Teams

ServiceNow

GPT for work

Respell

Notion

Compatible MCP servers

s

s

s

context sync

mcp notion server

src to kb

Find the right skills for your stack