Diacritic
A diacritic is a mark added to a base letter to modify its pronunciation, typically indicating accent, tone, length, or nasalisation.
A diacritic (also called a diacritical mark or accent) is a symbol added to a base letter to modify its pronunciation. Diacritics appear across many writing systems: French uses accents (é, è, ê), German uses the umlaut (ä, ö, ü), Spanish uses the tilde (ñ), and Indic scripts use various marks including matras (vowel signs), anusvara (nasalisation), visarga (aspiration), virama/halant (vowel-cancelling), and Urdu Nastaliq uses hamza (ء) and other Perso-Arabic marks. Arabic and Urdu also use tashkeel (diacritic marks indicating short vowels), which are usually omitted in casual writing but included in sacred texts, beginner materials, and TTS-optimised input. In TTS, diacritics provide pronunciation information — a TTS system reading an Arabic or Urdu text without tashkeel must guess short vowels; with tashkeel included, pronunciation is unambiguous. Proper diacritic rendering requires font support for the specific marks and positioning logic for combining them with base letters. Unicode encodes diacritics as combining characters — the base letter plus the diacritic form a grapheme cluster.
How it works
Diacritics fall into categories by function: vowel-indicating (French é, Indic matras, Arabic tashkeel fatha/kasra/damma), tone-indicating (Mandarin Pinyin marks, some African languages), nasalisation (Indic anusvara ं, Spanish ñ, Portuguese ã), length-indicating (some languages mark long vowels with macron ā), stress-marking (Spanish acute accent on stressed syllables), and cancellation (Indic virama/halant indicating the consonant has no following vowel). In Indic scripts specifically, diacritic marks include: anusvara (ं, nasalisation), visarga (ः, voiceless aspiration at word end), chandrabindu (ँ, nasalisation of preceding vowel), virama/halant (्, consonant-only indicator used in conjuncts), and various matras. Urdu Nastaliq uses Arabic-derived diacritics: fatha, kasra, damma (short vowels), shadda (consonant doubling), sukun (no vowel), tanwin (indefinite noun markers). Rendering diacritics correctly requires fonts with appropriate marks and shaping rules that position them correctly relative to base letters — complex in Nastaliq where base letter shape depends on neighbours.
Examples
Indic anusvara
हिन्दी (Hindi) vs हिंदी — the anusvara (ं) indicates nasalisation of the preceding vowel. Both spellings appear in modern usage; the anusvara form is more common.
Arabic tashkeel
Arabic كَتَبَ (kataba, "he wrote") vs كتب (ktb, unvowelled). The tashkeel marks show the /a/ vowels explicitly; unvowelled text requires the reader to infer vowels from context.
Gurmukhi addak
ਪੱਕਾ (pakkā, "firm") uses the addak (ੱ) to indicate doubled consonant. Without it, the word would read as ਪਕਾ (pakā), different pronunciation and meaning.
Why this matters for Indian-language TTS
Diacritics are central to Indian-language TTS accuracy. Hindi's anusvara and chandrabindu distinction, Malayalam's sandhi marks, Urdu's tashkeel (for religious and formal text), and Gurmukhi's addak all directly affect TTS pronunciation. A TTS system that silently drops or mis-handles diacritics produces noticeably wrong audio. VoisLabs' Indic input pipeline preserves all diacritic marks and uses them in pronunciation decisions.
Related terms
Matra
A matra is a dependent vowel sign in Indic scripts that attaches to a consonant to indicate the vowe…
Devanagari
Devanagari (देवनागरी) is the script used to write Hindi, Marathi, Nepali, Sanskrit, and several othe…
Nastaliq
Nastaliq (نستعلیق) is the Perso-Arabic calligraphic style used to write Urdu, with flowing diagonal …
Gurmukhi
Gurmukhi (ਗੁਰਮੁਖੀ) is the script used to write Punjabi in India, developed for the Guru Granth Sahib…
Text Shaping
Text shaping is the process of converting a sequence of Unicode characters into positioned glyphs fo…
Phoneme
A phoneme is the smallest distinct sound unit in a language that can change word meaning — e.g., the…
Frequently Asked Questions
Do I need to include diacritics in TTS input?
How are diacritics stored in Unicode?
Why do some tools drop diacritics in Indian-language output?
Try VoisLabs — Indian-language TTS done right
1 minute free per day. 12 languages. Native Indian-script karaoke subtitles. No card required.
Start freeLast verified: 2026-04-21