Text-to-speech skills

Text-to-speech skills that make LLM voice work easier

TextToSpeechSkills packages the speech workflow into reusable instructions and tools, so an LLM app knows how to prepare scripts, use expression tags, reuse voice templates, and create audio jobs without a long prompt every time.

Who is this for?

Text-to-speech skills are reusable instructions and helper workflows that teach an LLM app how to create audio with TextToSpeechSkills. Instead of explaining the process every time, the skill tells the LLM how to prepare scripts, apply readable expression tags, choose saved voice templates, check credit use, create speech jobs through MCP, and return audio links for review. This is useful for non-technical users who want an LLM to narrate videos, generate game dialogue, create course audio, or prepare support replies, and for developers who want the same workflow to become repeatable inside a product.

Easy LLM setup

LLM-ready even for non-technical teams

Install the skill package, connect the MCP tool, and tell your LLM app which voice templates are allowed. After that it can help write, tag, validate, and generate speech.

Read setup guide
01Create a scoped key
02Install MCP
03Choose a voice template
04Generate audio from chat

Teach the workflow once

A skill gives the LLM a repeatable way to turn text into tagged, template-backed speech instead of relying on a fresh explanation in every chat.

Keep prompt instructions consistent

The same rules for expression tags, templates, credit checks, and job creation can be reused across creators, teams, and workspaces.

Pair skills with MCP tools

Skills explain what good speech work looks like. MCP tools let the LLM safely validate, preview, create, and retrieve audio.

When this helps

Creators, teams, and developers who want LLM apps to handle text-to-speech workflows usually need a repeatable path for writing, review, generation, billing, and reuse. The most important jobs here are teach the workflow once, keep prompt instructions consistent, pair skills with mcp tools. Those are the moments where voice becomes part of real work instead of a one-off export.

How the workflow works

Start with readable text, add expression tags when tone matters, choose an approved voice template, and create a speech job through the UI, API, or MCP. The same pattern works for text-to-speech skills, text to speech skill, LLM TTS skills, AI speech workflow, which makes it easier for humans and LLM apps to share one process without exposing internal routing or credentials.

Before you roll it out

Decide which templates are approved, which expression tags are allowed, who can create workspace keys, and which usage limits are acceptable. Those choices keep automated voice generation useful without letting it sprawl from the first paid Test plan through Pro, Scale, and Business usage.

Common questions

What teams usually ask before starting

These are the practical details that matter before a team adds speech generation to a real workflow.

Who should use Text-to-Speech Skills for LLM Workflows?

Creators, teams, and developers who want LLM apps to handle text-to-speech workflows should use this page when they want generated speech that is easy to review, consistent across prompts, and simple to connect to LLM tools. The core workflow combines expression tags, voice templates, credit previews, and job-based generation.

Can a non-technical user connect this to an LLM app?

Install the skill package, connect the MCP tool, and tell your LLM app which voice templates are allowed. After that it can help write, tag, validate, and generate speech. The setup guide keeps the first path short while still giving developers a clean API when the workflow moves into a product backend.

How does pricing stay predictable?

Every paid plan uses credits. Teams can add credit packs when needed, and workspaces on Pro and higher add central billing for $2 per user per month.

API playground

Plain JSON in, speech job out

{
  "text": "[quiet] hello. [loud and angry] how are you?",
  "voice_template": "vt_calm_narrator_v1",
  "generation_mode": "instant",
  "format": "mp3"
}
202 queued for polling200 audio ready

MCP install

Agent tools included at launch

Claude Desktoppnpm --package texttospeechskills dlx tts-skills-mcp
Codexpnpm --package texttospeechskills dlx tts-skills-mcp
Cursorpnpm --package texttospeechskills dlx tts-skills-mcp
Skills helperpnpm --package texttospeechskills dlx tts-skills tags