Text-to-Speech

Tone Preset

A tone preset is a named configuration that applies a specific emotional style — horror, storytelling, devotional, etc. — to TTS output.

PPooja SharmaCo-founder, VoisLabs

LinkedInUpdated May 2026

A tone preset (sometimes called an emotion preset, style preset, or voice style) is a named configuration that applies a specific emotional and stylistic profile to text-to-speech output — changing pacing, pitch variation, emphasis patterns, and vocal energy to match a content category. Instead of hand-writing SSML with raw parameters like `rate="slow"` and `pitch="-2st"`, a creator picks a preset named "Horror Podcast" or "Kids Bedtime Story" or "YouTube Commentary", and the preset applies a tuned combination of those parameters automatically. Tone presets are a distinctive feature of creator-focused TTS platforms — ElevenLabs calls them "style presets", VoisLabs ships 48 presets across 9 categories (Horror, YouTube, Storytelling, Educational, Devotional, Kids, Podcast, Social Media, General). Presets abstract TTS complexity: a new user can produce a convincing horror podcast voiceover by selecting the Horror preset without learning SSML, prosodic marks, or phoneme specification. The tradeoff versus raw SSML is less fine-grained control in exchange for faster time-to-output.

How it works

Behind the scenes, a tone preset typically applies: baseline temperature (controls randomness in the neural model — higher for expressive content, lower for measured content), speaking rate multiplier (150 words/min baseline, adjusted +/- by preset), pitch shift, prosodic emphasis pattern, and sometimes a recommended voice pairing. Good preset systems also include recommended background music and aspect-ratio suggestions. Preset quality varies across platforms — some are just SSML templates with generic parameters, others are fine-tuned model configurations trained on preset-specific content. Purpose-built preset systems (like those for Indian devotional content) can outperform general-purpose emotion tags because they encode category-specific cadence rules (devotional Hindi has different pause patterns than devotional Tamil, etc.).

Examples

Horror Podcast preset

Slower rate, lower pitch, longer pauses, breathy resonance — produces dread without theatrics. Used by Hindi horror-story YouTube channels.

YouTube Shorts Hook preset

Higher energy, faster pace, upward inflection on key words — optimised for the first 3 seconds of vertical video retention.

Devotional preset

Respectful pacing, measured cadence, subtle pitch variation — suitable for Bhagavad Gita narration, Gurbani recitation, or Tafseer readings.

Why this matters for Indian-language TTS

Tone presets are particularly valuable for Indian-language TTS because Indian content conventions differ from Western ones — a Hindi devotional preset has different cadence rules than a Tamil devotional preset, and neither matches English meditation cadence. VoisLabs' 48 presets are tuned for Indian content categories specifically (horror stories, YouTube commentary, devotional chant, Kids Hindi storytelling, etc.) rather than being generic emotion tags.

Related terms

SSML (Speech Synthesis Markup Language)

SSML is an XML-based markup standard that lets you control pronunciation, pacing, pauses, emphasis, …

Neural TTS

Neural TTS uses deep learning to generate speech waveforms directly from text, producing voices that…

Prosody

Prosody is the rhythm, stress, intonation, and pacing patterns of speech — the musical dimension of …

Text-to-Speech (TTS)

Text-to-speech (TTS) is the technology that converts written text into spoken audio using synthesise…

Learn more

48 Tone Presets Devotional preset in action (Hanuman Chalisa)Horror preset in action

Frequently Asked Questions

Can I create custom tone presets?

Not currently on VoisLabs — the 48 shipped presets cover most use cases. Custom presets are on the roadmap. Advanced users can apply SSML via the API for custom parameter combinations.

Do tone presets work across all voices?

Yes — each of the 13 VoisLabs voices can render any preset. Some voice-preset combinations land better than others (e.g., Amit + Horror = excellent, Kavya + Horror = odd). Each voice page lists recommended presets.

How do tone presets differ from voice cloning?

Voice cloning changes WHICH voice speaks. Tone preset changes HOW that voice speaks (faster, slower, more excited, more calm). You can combine both: clone a voice, then apply different presets for different content types.

Try VoisLabs — Indian-language TTS done right

2 minutes free per day. 12 languages. Native Indian-script karaoke subtitles. No card required.

Start free

Last verified: 2026-04-21