Plain text stays plain
Tags sit inline with the sentence, so text remains easy to diff, approve, and store in prompts.
Readable audio direction
Use small, validated tags to make generated speech match the moment. Writers can review the same text that developers send to the API, and agents can validate tags before creating audio.
Expressive TTS markup in TextToSpeechSkills is a simple way to add voice direction directly inside normal text. A writer, developer, or LLM agent can use tags such as [quiet], [excited], or [loud and angry] to guide delivery while keeping the script readable. The same validation rules run in the UI, API, MCP server, and skills package, so unsupported tags can be caught before generation. This helps product teams keep copy reviewable, voice output consistent, and automated audio workflows easier to debug.
Easy LLM setup
Agents use the same tags humans see in the editor, so a non-technical teammate can approve the exact words and tone before audio is created.
Read setup guideTags sit inline with the sentence, so text remains easy to diff, approve, and store in prompts.
The API, UI, MCP server, and skill all use the same tag library to catch unsupported directions early.
Combine markup with templates to keep one product voice while changing emotion from line to line.
Teams that want expressive voice direction without brittle prompt instructions usually need a repeatable path for writing, review, generation, billing, and reuse. The most important jobs here are plain text stays plain, validation before generation, designed for repeatable style. Those are the moments where voice becomes part of real work instead of a one-off export.
Start with readable text, add expression tags when tone matters, choose an approved voice template, and create a speech job through the UI, API, or MCP. The same pattern works for expressive TTS, speech markup, voice tone tags, which makes it easier for humans and LLM apps to share one process without exposing internal routing or credentials.
Decide which templates are approved, which expression tags are allowed, who can create workspace keys, and which usage limits are acceptable. Those choices keep automated voice generation useful without letting it sprawl from the first paid Test plan through Pro, Scale, and Business usage.
Common questions
These are the practical details that matter before a team adds speech generation to a real workflow.
Teams that want expressive voice direction without brittle prompt instructions should use this page when they want generated speech that is easy to review, consistent across prompts, and simple to connect to LLM tools. The core workflow combines expression tags, voice templates, credit previews, and job-based generation.
Agents use the same tags humans see in the editor, so a non-technical teammate can approve the exact words and tone before audio is created. The setup guide keeps the first path short while still giving developers a clean API when the workflow moves into a product backend.
Every paid plan uses credits. Teams can add credit packs when needed, and workspaces on Pro and higher add central billing for $2 per user per month.
API playground
{
"text": "[quiet] hello. [loud and angry] how are you?",
"voice_template": "vt_calm_narrator_v1",
"generation_mode": "instant",
"format": "mp3"
}MCP install
pnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills-mcppnpm --package texttospeechskills dlx tts-skills tags