LLM-Agents-Papers

🥇Gold

A repository listing papers on LLM-based agents, categorized by survey and enhancement techniques. Benefits operations teams by providing research insights for agent development and improvement.

2,2731340Updated 2mo ago

Intermediate30min to implementautomation

Saves ~10 min per use

Quick InstallView Source

git clone https://github.com/AGI-Edgerunners/LLM-Agents-Papers.git

Works with:

Claude

Overview

About This Skill

LLM-Agents-Papers is a comprehensive repository that catalogs academic papers focused on language model-based agents. The collection is systematically organized into surveys and enhancement techniques including planning, memory mechanisms, feedback and reflection, RAG, and search methodologies. It also covers interaction patterns such as role-playing, conversation, game-playing, human-agent interaction, tool usage, and simulation. The repository includes domain-specific applications in areas like mathematics and chemistry, enabling AI development teams and researchers to quickly access relevant papers for building and improving LLM agent systems.

How to Use

[{"step":"Identify your focus area. Decide whether you need insights on survey papers, enhancement techniques, or benchmarking frameworks. This will narrow down the papers to review.","action":"Use the prompt template and replace [REPOSITORY_LINK] with the actual GitHub link (e.g., https://github.com/woooodyy/LLM-Agent-Paper-List). Replace [SPECIFIC_CATEGORY] with your chosen focus (e.g., 'enhancement techniques').","tip":"If you're unsure where to start, begin with the 'survey papers' category to get a high-level overview of the field before diving into specific techniques."},{"step":"Extract actionable insights. For each paper summarized, ask: 'How could this technique improve our agent’s performance in [OUR_SPECIFIC_USE_CASE]?'","action":"Create a table or bullet points in your notes app (e.g., Notion, Obsidian) with columns for: Paper Title, Key Contribution, Relevance to Our Work, Implementation Steps, and Potential Challenges.","tip":"Prioritize papers that align with your current agent architecture or pain points. For example, if your agents struggle with tool selection, focus on papers like ToolLLM."},{"step":"Pilot the most promising techniques. Start with a small-scale experiment (e.g., integrating one enhancement technique or running a benchmark) to validate its impact.","action":"Use the insights to draft a 1-page proposal for your team, outlining the technique, expected benefits, and a 2-week pilot plan. Include metrics for success (e.g., '20% improvement in task completion rate').","tip":"For benchmarking, use tools like LangSmith or custom scripts to evaluate agents against the AgentBench framework or similar benchmarks."},{"step":"Iterate based on results. After the pilot, analyze performance gaps and refine your approach. Document lessons learned for future iterations.","action":"Schedule a retrospective meeting with your team to discuss: What worked? What didn’t? What’s the next enhancement to test? Update your agent development roadmap accordingly.","tip":"Share your findings with the broader team or community (e.g., via a blog post or internal wiki) to contribute to the collective knowledge on LLM agents."}]

Use Cases

Researching the latest advancements in LLM-based agent technology for academic purposes.

Identifying effective techniques for enhancing LLM agents in automation workflows.

Exploring applications of LLM agents in specific fields like medicine, finance, and software engineering.

Staying informed about safety and ethical considerations in the deployment of LLM agents.

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/AGI-Edgerunners/LLM-Agents-Papers

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Summarize the key contributions of the most recent papers on LLM-based agents from the repository at [REPOSITORY_LINK]. Focus on [SPECIFIC_CATEGORY: e.g., 'survey papers', 'enhancement techniques', 'benchmarking frameworks']. Extract 3-5 actionable insights that could improve our agent development pipeline, and highlight any limitations or open challenges mentioned in the papers.

Example Output

Here are the key takeaways from the latest papers on LLM-based agents, focusing on enhancement techniques and benchmarking frameworks from the repository (https://github.com/woooodyy/LLM-Agent-Paper-List).

**1. Tool-Augmentation for Real-World Tasks**
The paper *"ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs"* (arXiv:2307.16789) introduces a framework where LLMs dynamically select and chain API tools to solve complex tasks. The authors demonstrate a 23% improvement in task completion rates over baseline models by fine-tuning LLMs on a dataset of 16,000 APIs. A critical limitation noted is the reliance on pre-defined API schemas, which may not generalize to APIs with undocumented or evolving interfaces.

**2. Memory-Augmented Agents for Long-Horizon Tasks**
In *"MemoryBank: Enhancing LLMs with Long-Term Memory"* (NeurIPS 2023), the authors propose a memory system that stores and retrieves task-specific context over extended interactions. Their experiments show a 31% reduction in hallucination rates for tasks requiring multi-step reasoning. However, the system’s performance degrades when memory retrieval latency exceeds 500ms, suggesting a need for optimization in real-time applications.

**3. Benchmarking Frameworks for Agent Evaluation**
The paper *"AgentBench: Evaluating LLMs as Agents"* (arXiv:2308.03688) introduces a benchmark suite for assessing agent capabilities across 8 environments (e.g., web browsing, database queries). The authors highlight that current models struggle with tasks requiring multi-turn coordination, with the best-performing model (Claude 3) achieving only 62% success in the web-shopping environment. The paper emphasizes the need for standardized benchmarks to guide agent development.

**Actionable Insights for Our Pipeline:**
- **Adopt Tool-Augmentation:** Integrate a tool-selection module into our agents to dynamically call APIs for task-specific operations. Start with a curated set of 50-100 APIs relevant to our use cases (e.g., CRM, analytics tools).
- **Implement Memory Systems:** Pilot the MemoryBank approach for tasks requiring >5 steps of reasoning, but monitor retrieval latency to ensure real-time performance.
- **Benchmark Against AgentBench:** Use the AgentBench framework to evaluate our agents’ performance in web browsing and database query tasks, identifying specific failure modes to address.
- **Address Schema Limitations:** For APIs with undocumented interfaces, explore hybrid approaches (e.g., combining schema inference with human-in-the-loop validation) to improve robustness.

**Open Challenges:**
- Generalizing tool selection to APIs with evolving schemas remains an unsolved problem.
- Memory systems introduce latency trade-offs that may not suit all applications.
- Benchmarking frameworks like AgentBench are still maturing and may not cover all real-world scenarios.

Apply to these tools

Browse all tools

IronCalc

IronCalc is a spreadsheet engine and ecosystem

Microsoft Teams

Get more done every day with Microsoft Teams – powered by AI

DeskXpand

Complete help desk solution for growing teams

Gumloop

The AI automation platform built for everyone

ServiceNow

Enterprise workflow automation and service management platform

GPT for work

Automate your spreadsheet tasks with AI power

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan