Control VICIdial TTS Pronunciation With SSML

How to control TTS pronunciation with SSML

SSML lets you tell VICIdial's text-to-speech engine how to say numbers, letters, and account IDs instead of guessing the pronunciation.

VICIfast Support·June 27, 2026·3 min read

How to control TTS pronunciation with SSML

When VICIdial reads text aloud with its text-to-speech engine, it has to guess how to pronounce what you wrote. Most of the time that guess is fine. But the moment you put a number or an account ID in front of it, the guesses get strange fast. SSML, the Speech Synthesis Markup Language, is how you stop guessing and tell the engine exactly what to say.

Why plain text is not enough

The TTS Text field of a TTS (text to speech) entry accepts SSML and passes it straight through to the speech engine. If you skip the markup, the engine treats numbers as quantities. Write 12574 and it reads back "twelve thousand five hundred and seventy four." That is correct math and useless on a call where you wanted to read an account number digit by digit.

This matters because a TTS entry can feed any Campaign audio prompt, and those prompts are heard by real people. A mispronounced confirmation number defeats the whole point of reading it back.

Spelling things out digit by digit

The fix is the say-as directive. Wrap the value and tell the engine to treat it as separate characters rather than a single number:

<say-as type='acronym'>12574</say-as>

With that wrapper the engine reads "one two five seven four" instead of a single large number. The same idea covers letters, codes, and anything else you want spoken character by character.

What else SSML controls

Pronunciation is the headline use, but the markup reaches further. You can shape how a prompt sounds without re-recording anything:

Pronunciation of numbers, acronyms, and account IDs
Volume of the spoken output
Pitch of the voice
Rate, so the engine slows down for an important detail

Because the entry pulls from your default lead tables, you can blend dynamic data with these controls. A confirmation digit string from the Lead record can be wrapped in say-as so every customer hears their own number spelled out cleanly.

How the markup reaches the caller

The path is short. Your campaign prompt points at a TTS entry, the entry's SSML is handed to the speech engine on the dialer, and the rendered audio is played back to the caller over the same Asterisk Dialplan that handles every other prompt.

sequenceDiagram
  participant C as Campaign Prompt
  participant T as TTS Entry SSML
  participant E as Speech Engine
  participant A as Asterisk Dialplan
  participant P as Caller
  C->>T: Request rendered audio
  T->>E: Pass SSML markup
  E->>A: Return spoken audio
  A->>P: Play prompt

TTS is unforgiving. A single typo in the SSML can make the engine behave in strange ways or skip the prompt entirely. Test every entry before it touches a live campaign.

Where to go next

SSML controls how text is spoken, but the rendered output still lands in your audio store as a normal sound file. To see how that store works, read the VICIdial audio store guide. For the full picture of prompts, voicemail, and TTS together, see the audio prompts and TTS guide. The engine that interprets this SSML is Cepstral, which we cover in what Cepstral is.

TTS shines in an IVR (interactive voice response) or a survey-style flow where dynamic data has to be spoken back accurately. If you would rather run a managed dialer where this is already wired up and tested, see our plans and pricing.

About VICIfast LLC

VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.

About us Pricing Status page

Citing this article

VICIfast Engineering. “How to control TTS pronunciation with SSML”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/vicidial-tts-ssml-explained