Agent PR Replay takes merged PRs from any repository, reverse-engineers the task prompt, runs Claude Code against it, and compares what the agent did versus what humans actually shipped. The result is targeted, empirical guidance.
git clone https://github.com/sshh12/agent-pr-replay.gitAgent PR Replay takes merged PRs from any repository, reverse-engineers the task prompt, runs Claude Code against it, and compares what the agent did versus what humans actually shipped. The result is targeted, empirical guidance.
1. **Identify Target PRs:** Select a recently merged PR from your repository that represents a non-trivial task (e.g., feature implementation, bug fix, or optimization). Focus on PRs with clear descriptions or commit messages. 2. **Run the Analysis:** Use the prompt template above, replacing [PR_NUMBER] and [REPO_URL] with the actual values. For best results, include the PR title and description in your input to help the AI infer the original prompt accurately. 3. **Review the Comparison:** Examine the AI-generated output versus the human implementation. Pay attention to differences in approach, efficiency, and edge cases handled. Use tools like `git diff` to inspect the actual changes in the PR. 4. **Extract Insights:** Document the key differences and their implications. Focus on patterns (e.g., 'AI agents often propose simpler solutions that don't account for edge cases' or 'Humans over-engineer solutions when real-time updates are required'). 5. **Refine Workflows:** Use the insights to improve your prompt engineering practices. For example, if AI agents consistently miss real-time requirements, add explicit constraints to your prompts. Share findings with your team to standardize prompt templates.
No install command available. Check the GitHub repository for manual installation instructions.
git clone https://github.com/sshh12/agent-pr-replayCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
I need to analyze the merged PR [PR_NUMBER] from [REPO_URL]. Extract the original task prompt that triggered the PR (if not explicitly stated, infer it from the PR description and code changes). Then, simulate what an AI agent would have produced when given that prompt using Claude Code. Compare the AI-generated output with the actual human implementation in the PR. Highlight key differences in approach, quality, and efficiency. Finally, provide actionable recommendations for improving future AI agent prompts or workflows based on these insights.
Analyzing PR #142 from the `data-pipelines` repository (https://github.com/acme/data-pipelines/pull/142), which merged a new feature to optimize the ETL process for customer transaction data. The PR description stated: 'Add a new column to the transactions table that calculates the rolling 7-day average purchase amount per customer, partitioned by customer_id.'
**Inferred Original Prompt:**
'Write a SQL query to add a computed column to the transactions table that calculates the 7-day rolling average of purchase amounts, grouped by customer_id. The column should be named rolling_7day_avg and should be updated automatically as new transactions are inserted.'
**AI Agent Simulation (Claude Code Output):**
```sql
ALTER TABLE transactions
ADD COLUMN rolling_7day_avg DECIMAL(10,2);
-- Create a materialized view for the rolling average
CREATE MATERIALIZED VIEW customer_rolling_avg AS
SELECT
customer_id,
AVG(amount) OVER (
PARTITION BY customer_id
ORDER BY transaction_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS rolling_7day_avg
FROM transactions;
```
**Human Implementation (Actual PR):**
```sql
-- Added a trigger-based solution
CREATE OR REPLACE FUNCTION update_rolling_avg()
RETURNS TRIGGER AS $$
BEGIN
UPDATE transactions
SET rolling_7day_avg = (
SELECT AVG(amount)
FROM transactions t2
WHERE t2.customer_id = NEW.customer_id
AND t2.transaction_date BETWEEN NEW.transaction_date - INTERVAL '7 days' AND NEW.transaction_date
)
WHERE id = NEW.id;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_update_rolling_avg
AFTER INSERT ON transactions
FOR EACH ROW EXECUTE FUNCTION update_rolling_avg();
```
**Key Differences:**
1. **Approach:** The AI agent proposed a materialized view, which would require periodic refreshes, while the human implemented a trigger-based solution for real-time updates.
2. **Performance:** The trigger approach ensures data consistency at write-time, while the materialized view would need to be refreshed to stay current.
3. **Complexity:** The human solution required additional PL/pgSQL expertise, which the AI agent didn't account for.
**Actionable Recommendations:**
1. **Prompt Refinement:** Future prompts should specify whether real-time updates are required (e.g., 'Ensure the rolling average is updated in real-time as new transactions are inserted').
2. **Database Constraints:** The prompt should include database-specific constraints (e.g., 'Use PostgreSQL triggers for real-time updates').
3. **Testing Requirements:** Add a requirement to test performance under high write loads, as triggers can impact insert performance.
4. **Documentation:** Include a note about the trade-offs between materialized views and triggers in the prompt to guide the agent's approach.AI assistant built for thoughtful, nuanced conversation
IronCalc is a spreadsheet engine and ecosystem
ITIL-aligned IT service management platform
Customer feedback management made simple
Enterprise workflow automation and service management platform
Automate your spreadsheet tasks with AI power
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan