Text Shaping
Text shaping is the process of converting a sequence of Unicode characters into positioned glyphs for display, handling ligatures and complex scripts.
Text shaping is the process by which software converts a sequence of Unicode text characters into positioned glyphs ready for rendering. For simple scripts like English, shaping is trivial — each letter maps to one glyph placed left-to-right. For complex scripts (Devanagari, Tamil, Malayalam, Arabic Nastaliq, Thai), shaping is far more involved: the shaping engine must form conjuncts (combining consonants into ligatures), reorder visual elements (Indic matra ि that's typed after but displayed before), substitute contextual letter forms (Arabic letters have 2-4 shapes depending on position), position diacritics correctly above or below base letters, and apply kerning. HarfBuzz is the open-source industry-standard text shaping engine — used by Android, Firefox, Chrome, Apple platforms, and most Linux distributions. Software that doesn't integrate HarfBuzz (or an equivalent shaper) renders complex scripts incorrectly: visible halants in Indic text, isolated letter forms in Arabic, mis-positioned matras. Quality text shaping is invisible when it works and obvious when it fails.
How it works
A shaping engine reads input as Unicode code points, looks up the font's OpenType shaping tables (GSUB — glyph substitution, GPOS — glyph positioning), and produces a sequence of glyph IDs with (x, y) offsets. For Indic scripts, this process typically runs: detect the script (Devanagari, Tamil, etc.), identify syllable clusters (akshara — consonant + optional matra), apply syllable-level rules (handle halant-based conjuncts, handle above/below/post/pre matras), position glyphs. For Arabic/Nastaliq: identify word-initial/medial/final letter positions, look up contextual alternates, apply ligatures (some 2-to-1, some 3-to-1 or larger), kern neighbouring letters. Modern fonts ship with extensive OpenType tables — e.g., Noto Sans Devanagari has thousands of GSUB/GPOS rules covering common conjuncts, rare conjuncts, and edge cases. Software without full OpenType shaping falls back to rendering each glyph as-stored — producing technically-readable but visually-wrong output for complex scripts.
Examples
Devanagari shaping pipeline
Input "क्ष" (Unicode: क + ् + ष). Shaper detects the halant-mediated conjunct, substitutes the क्ष ligature glyph from the font, produces one positioned glyph instead of three.
Arabic contextual shaping
Input "اردو" (Urdu). Each letter has 4 contextual forms. Shaper identifies ا (isolated), ر (final), د (initial), و (isolated — word-final rounded form). Output: 4 correctly contextualised glyphs.
HarfBuzz in action
Chrome, Firefox, Android, and iOS all use HarfBuzz for text shaping. When you type Hindi or Tamil in a modern browser, HarfBuzz is what renders it correctly.
Why this matters for Indian-language TTS
Text shaping is where most Indian-language rendering fails. Video editors, subtitle tools, and generative AI image/video platforms that don't use HarfBuzz or equivalent produce broken Devanagari conjuncts, mis-positioned Tamil matras, and disconnected Arabic letters. VoisLabs' Remotion-based video renderer uses HarfBuzz-compatible shaping for all 10 supported Indian scripts, producing subtitle output that matches how native readers expect the text to look.
Related terms
Devanagari
Devanagari (देवनागरी) is the script used to write Hindi, Marathi, Nepali, Sanskrit, and several othe…
Tamil Script
The Tamil script (தமிழ் எழுத்து) is a Brahmi-derived abugida used to write Tamil, one of the oldest …
Malayalam Script
The Malayalam script (മലയാളം ലിപി) is a Brahmi-derived writing system used for Malayalam, the classi…
Nastaliq
Nastaliq (نستعلیق) is the Perso-Arabic calligraphic style used to write Urdu, with flowing diagonal …
Conjunct Consonant
A conjunct consonant is a single glyph formed by combining two or more consonant letters in Indic sc…
Matra
A matra is a dependent vowel sign in Indic scripts that attaches to a consonant to indicate the vowe…
Diacritic
A diacritic is a mark added to a base letter to modify its pronunciation, typically indicating accen…
Frequently Asked Questions
What is HarfBuzz?
Why does the same Hindi text look different in different apps?
Can I test a tool's text shaping quality?
Try VoisLabs — Indian-language TTS done right
1 minute free per day. 12 languages. Native Indian-script karaoke subtitles. No card required.
Start freeLast verified: 2026-04-21