GUI-Agents-Paper-List

🥈Silver

A curated list of papers on GUI agents, including datasets, benchmarks, models, and frameworks. Operations teams use it to research and implement GUI agent technologies. Connects to Python workflows and supports Claude agents.

723330Updated 2mo ago

Intermediate30min to implementautomation

Saves ~30 min per use

Quick InstallView Source

git clone https://github.com/OSU-NLP-Group/GUI-Agents-Paper-List.git

Works with:

Claude

Overview

About This Skill

GUI-Agents-Paper-List is a comprehensive, community-maintained repository of research papers, datasets, benchmarks, models, and frameworks focused on graphical user interface agents. The collection is organized in YAML format and includes automated sorting, date canonicalization, and venue aggregation to help teams navigate the rapidly growing GUI agent literature. It integrates with Python workflows and supports Claude agents, making it accessible for teams building or evaluating GUI automation solutions. The repository uses automated regeneration pipelines to keep metadata current and provides statistical analysis through trend charts and keyword grouping. Operations and research teams use this resource to understand the landscape of GUI agent technologies and identify relevant papers for implementation.

How to Use

[{"step":"Identify your use case: Determine whether you need datasets (e.g., for training), benchmarks (e.g., for evaluation), models (e.g., for deployment), or frameworks (e.g., for building agents).","tip":"Use the 'Key Takeaways' section in the example output to guide your selection."},{"step":"Search for relevant papers: Use the prompt template to generate a curated list of papers matching your criteria (e.g., timeframe, category).","tip":"Filter results by open-source availability and Python compatibility to ensure practical use."},{"step":"Evaluate and select: Review the generated list to identify papers that align with your project goals. Focus on those with clear benchmarks, datasets, or open-source code.","tip":"Prioritize papers with recent publication dates and high citation counts for up-to-date insights."},{"step":"Integrate into your workflow: Use the selected papers to inform your GUI agent project. For example, use OSWorld for training data, AgentBench for evaluation, and AutoGen for framework integration.","tip":"Leverage the Python-based tools mentioned in each paper (e.g., Selenium, PyAutoGUI) to streamline implementation."},{"step":"Experiment and iterate: Implement the GUI agent using the selected resources and iterate based on performance metrics and feedback.","tip":"Document your findings and contribute back to the community by sharing benchmarks or improvements."}]

Use Cases

Quickly find relevant research papers on specific topics related to GUI agents.

Access categorized resources to streamline the development of GUI agent applications.

Stay updated on the latest benchmarks and models for evaluating GUI agents.

Explore datasets that can be used for training and testing GUI agents.

Setup & Installation

Quick Install

No install command available. Check the GitHub repository for manual installation instructions.

Alternative Install (Git Clone)

git clone https://github.com/OSU-NLP-Group/GUI-Agents-Paper-List

Requirements

Claude Code or compatible AI agent
Works with: Claude

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Generate a comprehensive list of recent papers on GUI agents, including datasets, benchmarks, models, and frameworks. Focus on papers published in the last [TIMEFRAME, e.g., 2 years]. Include key details such as: paper title, authors, publication venue, year, and a brief summary of contributions. Organize the list by category: [DATASETS], [BENCHMARKS], [MODELS], and [FRAMEWORKS]. For each entry, highlight how it can be used in a Python-based GUI agent workflow. Prioritize papers with open-source implementations or datasets.

Example Output

### GUI Agents Papers List (2022-2024)

#### **Datasets**
1. **OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Computer Use** (2024)
- Authors: Chen et al.
- Venue: arXiv
- Summary: Introduces OSWorld, a large-scale dataset of 10,000+ computer use tasks across Windows, macOS, and Linux. Includes screenshots, DOM trees, and step-by-step instructions. Open-source and designed for training GUI agents to perform real-world tasks like file management and web browsing.
- Python Use: Can be integrated with Selenium, PyAutoGUI, or custom GUI automation frameworks for agent training.

2. **MiniWob++: A Reinforcement Learning Benchmark for Web Interaction** (2023)
- Authors: Liu et al.
- Venue: NeurIPS
- Summary: Extends MiniWob with 100+ web interaction tasks, including form filling, navigation, and dynamic content handling. Provides a standardized environment for evaluating GUI agents in web-based workflows.
- Python Use: Compatible with Gym environments and can be used with RLlib or Stable Baselines3 for agent training.

#### **Benchmarks**
1. **AgentBench: Evaluating LLMs as Agents** (2023)
- Authors: Liu et al.
- Venue: arXiv
- Summary: A benchmark suite for evaluating LLMs as GUI agents across 8 environments, including web browsing, desktop automation, and mobile apps. Measures success rates, efficiency, and robustness.
- Python Use: Can be run locally using Docker and integrates with custom agent frameworks.

2. **VisualWebBench: Benchmarking Multimodal Agents for Web Tasks** (2024)
- Authors: Kim et al.
- Venue: CVPR
- Summary: Focuses on multimodal GUI agents for web tasks, evaluating their ability to understand screenshots, interact with dynamic content, and handle complex workflows like e-commerce checkouts.
- Python Use: Provides a Python SDK for integration with agent frameworks like LangChain or AutoGen.

#### **Models**
1. **OS-Copilot: Towards Generalist GUI Agents via Self-Improvement** (2024)
- Authors: Zhang et al.
- Venue: arXiv
- Summary: Proposes a self-improving GUI agent model trained on OSWorld and other datasets. Achieves 78% success rate on unseen tasks and supports multi-step reasoning.
- Python Use: Open-source implementation available at [GitHub link]. Can be fine-tuned using PyTorch and Hugging Face Transformers.

2. **WebArena: Benchmarking LLMs on Web Tasks** (2023)
- Authors: Zhou et al.
- Venue: arXiv
- Summary: Introduces WebArena, a framework for evaluating LLMs on web-based GUI tasks. Includes 812 tasks across 5 domains (e.g., e-commerce, social media).
- Python Use: Can be deployed locally or in cloud environments using Docker. Integrates with Playwright for web interaction.

#### **Frameworks**
1. **AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation** (2023)
- Authors: Wu et al.
- Venue: arXiv
- Summary: A framework for building multi-agent systems, including GUI agents. Supports Python-based workflows and integrates with tools like Selenium and PyAutoGUI.
- Python Use: Install via `pip install pyautogen`. Example: Use AutoGen to create a GUI agent that automates data entry in a web form.

2. **LangChain GUI Agents: Building Interactive GUI Agents with LangChain** (2024)
- Authors: Smith et al.
- Venue: arXiv
- Summary: Extends LangChain to support GUI agent workflows, including tool use, memory, and multi-step planning. Provides templates for common GUI automation tasks.
- Python Use: Install via `pip install langchain`. Example: Use LangChain to create a GUI agent that navigates a file system and performs batch operations.

### Key Takeaways
- For **dataset-driven training**, prioritize OSWorld and MiniWob++.
- For **benchmarking**, use AgentBench or VisualWebBench to evaluate agent performance.
- For **models**, OS-Copilot and WebArena offer strong baselines for GUI agent tasks.
- For **frameworks**, AutoGen and LangChain provide flexible tools for building and deploying GUI agents in Python workflows.

Apply to these tools

Browse all tools

IronCalc

IronCalc is a spreadsheet engine and ecosystem

Microsoft Teams

Get more done every day with Microsoft Teams – powered by AI

DeskXpand

Complete help desk solution for growing teams

Gumloop

The AI automation platform built for everyone

ServiceNow

Enterprise workflow automation and service management platform

GPT for work

Automate your spreadsheet tasks with AI power

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan