This skill allows users to convert spoken language into text across 23 languages, providing various output modes like transcribe, translate, and more. It's designed for developers needing speech recognition capabilities in their applications.
$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.This skill transcribes audio to text using Sarvam AI's Saaras v3 model, supporting 23 Indian languages with auto language detection. It offers five output modes: transcribe, translate, verbatim, transliteration, and code-mixed text. Developers can choose between REST API (up to 30 seconds), WebSocket streaming (up to 8 hours), or Batch API with speaker diarization for longer audio files. The skill is ideal for building voice-enabled applications, meeting transcription systems, and voice interfaces that require accurate speech recognition across Indian languages.
Install via command line and integrate using the Sarvam AI Python client.
Transcribing meetings or lectures into text
Translating spoken content for multilingual audiences
Creating subtitles or captions for videos
Building voice-enabled applications for accessibility
$ npx skills add https://github.com/sarvamai/skills --skill speech-to-text.git clone https://github.com/sarvamai/skillsCopy the install command above and run it in your terminal.
Launch Claude Code, Cursor, or your preferred AI coding agent.
Use the prompt template or examples below to test the skill.
Adapt the skill to your specific use case and workflow.
Check the GitHub repository or documentation for usage examples.
transform text into lifelike voiceovers
Converts spoken words into summaries effortlessly
Transform text into high-quality, natural-sounding speech
High-accuracy voice AI models for transcription and translation
Transform text into natural, emotive speech across multiple Indian languages
Transform text into natural and smooth human voice
Take a free 3-minute scan and get personalized AI skill recommendations.
Take free scan