What Text-to-Speech Is in VICIdial

What Text-to-Speech is in VICIdial

Text-to-Speech turns typed text into spoken audio in VICIdial, but it depends on Cepstral being installed and a system setting being enabled.

VICIfast Support·June 27, 2026·3 min read

Text-to-Speech, or TTS (text to speech), is the feature that takes typed text and turns it into spoken audio a caller can hear. Instead of recording every prompt yourself, you type the words and the system generates the audio file. It is handy when wording changes often or when you want a message to read live data aloud. This post explains what TTS actually is in VICIdial and what has to be in place before it works.

What a TTS entry is

A TTS entry is a named record that holds the text you want spoken and the settings for how it is spoken. You manage entries from the Text To Speech section of the admin. The list there shows each entry's TTS ID, name, active status, and the beginning of its prompt text, with a link to modify each one. A TTS entry on its own is just a definition; it produces audio when something in your call flow references it.

What TTS depends on

TTS is not built into a plain VICIdial install. Two things must be true before any entry produces sound.

Cepstral must be installed and configured on the system by an administrator. Cepstral is the engine that actually synthesizes the speech.
The System Settings option for TTS must be enabled. Without that flag the entries exist but never generate audio.

Cepstral is licensed. Each server you run TTS on needs at least one channel license, one voice, and the save-to-file license. Plan licensing per box, not per cluster.

How a TTS prompt gets rendered

When a call flow reaches a TTS entry, the text is sent to Cepstral, which creates an audio file, and that file is played to the caller. This is why TTS can sit anywhere audio plays: an IVR (interactive voice response), a Call menu, or any Dialplan step that points at the entry.

sequenceDiagram
  participant C as Call flow
  participant V as VICIdial
  participant E as Cepstral
  participant K as Caller
  C->>V: Reach TTS entry
  V->>E: Send TTS text
  E->>V: Return audio file
  V->>K: Play audio to caller

When to use it

Reach for TTS when the wording changes often, when you need to read a lead's details back, or when recording a human voice for every variant is not worth it. For fixed messages a recorded prompt usually sounds better, so many teams mix the two. A holiday-hours notice or a name read back to the caller is a natural fit for TTS, while your main brand greeting is often worth recording properly once.

It is also worth knowing what TTS is not. It is not a replacement for the audio store, and it does not change how calls are routed; it simply produces an audio file that a step in your call flow plays. Treat it as one more source of audio sitting next to your recorded prompts rather than a separate subsystem.

If you are choosing between recorded audio and synthesized speech, the audio store overview covers how recorded prompts are stored.

Once Cepstral is in place you can create your first entry; the next step is covered in adding a TTS entry. For the wider picture of prompts, Music on hold, and synthesized audio, see the audio and TTS guide. If you would rather not manage Cepstral licensing yourself, VICIfast ships a ready dialer in under 40 seconds, so check the plans.

About VICIfast LLC

VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.

About us Pricing Status page

Citing this article

VICIfast Engineering. “What Text-to-Speech is in VICIdial”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/what-is-vicidial-text-to-speech