Image to Text

This skill utilizes Tesseract.js to extract all readable text from images, providing full text content along with word-level bounding boxes and confidence scores. It's designed for users needing to convert visual text from images into editable text formats.

7800Updated 2mo ago

Image Processing

Quick InstallView Source

$ npx skills add https://github.com/pascalorg/skills --skill image-to-text

Overview

About This Skill

Image to Text is an OCR skill powered by Tesseract.js that extracts all readable text from images and returns structured data including full text content, word-level bounding boxes, and confidence scores. It works by segmenting images into lines and words, making it ideal for reading screenshots, extracting UI copy from design mockups, and obtaining precise text positioning information. The skill supports multiple languages and provides confidence metrics to help assess extraction accuracy. Common use cases include converting design mockups to editable text without manual retyping, extracting UI labels and button text, and analyzing text layout and positioning in images.

How to Use

Install using the command line with the provided install command.

Use Cases

Reading text content from screenshots or design mockups

Extracting UI copy including labels and buttons

Getting text positions and bounding boxes from design images

Setup & Installation

Quick Install

Terminal

$ npx skills add https://github.com/pascalorg/skills --skill image-to-text

Alternative Install (Git Clone)

git clone https://github.com/pascalorg/skills

Requirements

Claude Code or compatible AI agent

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Extract all readable text from the image at [IMAGE_URL_OR_PATH], including word-level bounding boxes and confidence scores for each word. Return the full text content in a structured format, and highlight any text with low confidence scores (below 80%) for review.

Example Output

```
Extracted Text from Invoice_2023.png
-------------------------------------

**Full Text:**

Invoice #: INV-2023-0542
Date: 2023-10-15
Customer: GreenLeaf Organic Farms

| Item               | Quantity | Unit Price | Total    |
|--------------------|----------|------------|----------|
| Organic Apples     | 10       | $2.50      | $25.00   |
| Free-Range Eggs    | 2        | $4.00      | $8.00    |
| Whole Wheat Bread  | 3        | $3.25      | $9.75    |

Subtotal: $42.75
Tax (8%): $3.42
Total Due: $46.17

**Word-Level Bounding Boxes & Confidence:**
- "Invoice" (92%) [x1: 50, y1: 30, x2: 120, y2: 50]
- "#:" (88%) [x1: 125, y1: 30, x2: 140, y2: 50]
- "INV-2023-0542" (76%) [x1: 145, y1: 30, x2: 280, y2: 50] ← **Low confidence**
- "Date:" (95%) [x1: 50, y1: 70, x2: 90, y2: 90]
- "2023-10-15" (99%) [x1: 100, y1: 70, x2: 200, y2: 90]

**Notes:**
- The invoice number "INV-2023-0542" has a low confidence score. Verify the text in the image.
```

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Install using the command line with the provided install command.

Use Cases

Reading text content from screenshots or design mockups

Extracting UI copy including labels and buttons

Getting text positions and bounding boxes from design images

Quick Install

Terminal

$ npx skills add https://github.com/pascalorg/skills --skill image-to-text

Alternative Install (Git Clone)

git clone https://github.com/pascalorg/skills

Requirements

Claude Code or compatible AI agent

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Extract all readable text from the image at [IMAGE_URL_OR_PATH], including word-level bounding boxes and confidence scores for each word. Return the full text content in a structured format, and highlight any text with low confidence scores (below 80%) for review.

Example Output

```
Extracted Text from Invoice_2023.png
-------------------------------------

**Full Text:**

Invoice #: INV-2023-0542
Date: 2023-10-15
Customer: GreenLeaf Organic Farms

| Item               | Quantity | Unit Price | Total    |
|--------------------|----------|------------|----------|
| Organic Apples     | 10       | $2.50      | $25.00   |
| Free-Range Eggs    | 2        | $4.00      | $8.00    |
| Whole Wheat Bread  | 3        | $3.25      | $9.75    |

Subtotal: $42.75
Tax (8%): $3.42
Total Due: $46.17

**Word-Level Bounding Boxes & Confidence:**
- "Invoice" (92%) [x1: 50, y1: 30, x2: 120, y2: 50]
- "#:" (88%) [x1: 125, y1: 30, x2: 140, y2: 50]
- "INV-2023-0542" (76%) [x1: 145, y1: 30, x2: 280, y2: 50] ← **Low confidence**
- "Date:" (95%) [x1: 50, y1: 70, x2: 90, y2: 90]
- "2023-10-15" (99%) [x1: 100, y1: 70, x2: 200, y2: 90]

**Notes:**
- The invoice number "INV-2023-0542" has a low confidence score. Verify the text in the image.
```

Image to Text

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Find the right skills for your stack

Image to Text

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Find the right skills for your stack