Speech-to-Text

This skill allows users to convert spoken language into text across 23 languages, providing various output modes like transcribe, translate, and more. It's designed for developers needing speech recognition capabilities in their applications.

66130Updated Today

AI Agents

Quick InstallView Source

$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.

Overview

About This Skill

This skill transcribes audio to text using Sarvam AI's Saaras v3 model, supporting 23 Indian languages with auto language detection. It offers five output modes: transcribe, translate, verbatim, transliteration, and code-mixed text. Developers can choose between REST API (up to 30 seconds), WebSocket streaming (up to 8 hours), or Batch API with speaker diarization for longer audio files. The skill is ideal for building voice-enabled applications, meeting transcription systems, and voice interfaces that require accurate speech recognition across Indian languages.

How to Use

Install via command line and integrate using the Sarvam AI Python client.

Use Cases

Transcribing meetings or lectures into text

Translating spoken content for multilingual audiences

Creating subtitles or captions for videos

Building voice-enabled applications for accessibility

Setup & Installation

Quick Install

Terminal

$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.

Alternative Install (Git Clone)

git clone https://github.com/sarvamai/skills

Requirements

Claude Code or compatible AI agent

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Convert the following [LANGUAGE] audio into text. Use the [OUTPUT_MODE] mode to [TRANSCRIBE/TRANSLATE] the content. Ensure accuracy for technical terms in [INDUSTRY]. Audio file: [UPLOAD_AUDIO_FILE_LINK].

Example Output

### Transcription & Translation Results

**Original Language:** Spanish (es-ES)

**Transcription:**
> "Hola equipo, hoy revisaremos los datos de ventas del Q2 en [COMPANY]. Necesitamos identificar las tendencias clave en el mercado europeo, especialmente en Alemania e Italia. También compararemos estos resultados con los del Q1 para evaluar nuestro progreso."

**Translated to English:**
> "Hi team, today we’ll review Q2 sales data for [COMPANY]. We need to identify key trends in the European market, particularly in Germany and Italy. We’ll also compare these results with Q1 to assess our progress."

**Confidence Score:** 98%
**Detected Industry:** Retail/E-commerce
**Speaker Count:** 1 (male, 30s)

*Notes:*
- Audio was clear with minimal background noise.
- Technical terms (e.g., "Q2", "mercado europeo") were accurately transcribed.
- Translation prioritized clarity over literal word-for-word accuracy.

Apply to these tools

Browse all tools

Speech to Note

Converts spoken words into summaries effortlessly

Compatible MCP servers

Browse all MCP servers

Find the right skills for your stack

Take a free 3-minute scan and get personalized AI skill recommendations.

Take free scan

Overview

About This Skill

How to Use

Install via command line and integrate using the Sarvam AI Python client.

Use Cases

Transcribing meetings or lectures into text

Translating spoken content for multilingual audiences

Creating subtitles or captions for videos

Building voice-enabled applications for accessibility

Quick Install

Terminal

$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.

Alternative Install (Git Clone)

git clone https://github.com/sarvamai/skills

Requirements

Claude Code or compatible AI agent

Quick Start Guide

Install the Skill

Copy the install command above and run it in your terminal.

Open Your AI Agent

Launch Claude Code, Cursor, or your preferred AI coding agent.

Try It Out

Use the prompt template or examples below to test the skill.

Customize

Adapt the skill to your specific use case and workflow.

Usage Examples

Prompt Template

Convert the following [LANGUAGE] audio into text. Use the [OUTPUT_MODE] mode to [TRANSCRIBE/TRANSLATE] the content. Ensure accuracy for technical terms in [INDUSTRY]. Audio file: [UPLOAD_AUDIO_FILE_LINK].

Example Output

### Transcription & Translation Results

**Original Language:** Spanish (es-ES)

**Transcription:**
> "Hola equipo, hoy revisaremos los datos de ventas del Q2 en [COMPANY]. Necesitamos identificar las tendencias clave en el mercado europeo, especialmente en Alemania e Italia. También compararemos estos resultados con los del Q1 para evaluar nuestro progreso."

**Translated to English:**
> "Hi team, today we’ll review Q2 sales data for [COMPANY]. We need to identify key trends in the European market, particularly in Germany and Italy. We’ll also compare these results with Q1 to assess our progress."

**Confidence Score:** 98%
**Detected Industry:** Retail/E-commerce
**Speaker Count:** 1 (male, 30s)

*Notes:*
- Audio was clear with minimal background noise.
- Technical terms (e.g., "Q2", "mercado europeo") were accurately transcribed.
- Translation prioritized clarity over literal word-for-word accuracy.

Speech-to-Text

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Speech to Note

Compatible MCP servers

MCP Server Hub

ElevenLabs MCP Server

say-mcp-server

Find the right skills for your stack

Speech-to-Text

Overview

About This Skill

How to Use

Use Cases

Tags

Setup & Installation

Quick Install

Alternative Install (Git Clone)

Requirements

Quick Start Guide

Install the Skill

Open Your AI Agent

Try It Out

Customize

Usage Examples

Prompt Template

Example Output

Apply to these tools

Speech to Note

Compatible MCP servers

MCP Server Hub

ElevenLabs MCP Server

say-mcp-server

Find the right skills for your stack