What Cepstral is and why VICIdial TTS needs it
Cepstral is the text-to-speech engine that turns VICIdial prompt text into spoken audio, with SSML control over pronunciation, pitch, and volume.
When VICIdial speaks a prompt from text instead of a recording, something has to do the actual speaking. That something is Cepstral, the text-to-speech engine that integrates with VICIdial. If you have ever wondered why your dialer needs a separate piece of software just to read a name aloud, this is the answer.
What Cepstral does
Cepstral provides the text-to-speech layer that VICIdial calls on. Without it there is no engine to render your written prompts into audio. With it, the TTS (text to speech) features open up across the platform. Once installed, it lets you do a few things:
- Any campaign-related audio prompt can use a TTS script instead of a recorded file
- Speech scripts can pull from your default lead tables for personalized prompts
- SSML markup controls pronunciation, volume, pitch, and rate
Why SSML matters
Cepstral uses SSML, the Speech Synthesis Markup Language, to take direction. Plain text gives the engine no guidance, so it makes its own choices about how to say a number or a code. Markup lets you override that, which is how you get an account number read digit by digit instead of as one large figure. For a Campaign that reads back reference numbers, this control is the difference between a clear prompt and a confusing one.
It lives on the dialer
Cepstral is installed on each dialer that will use the service, not on a single central server that the rest of the cluster shares. The exact behavior you get depends on how your system administrator set up the integration, but the rendering itself happens locally on the box placing the calls.
That local install is why TTS feels instant in a flow: the engine renders the text and hands it straight to the call path on the same machine. From there it plays like any other prompt.
How a TTS prompt actually plays
The chain is short and worth picturing. A campaign prompt references a TTS entry, the entry text plus SSML goes to Cepstral, Cepstral returns spoken audio, and the Asterisk Dialplan plays it to the caller.
sequenceDiagram
participant V as VICIdial Campaign
participant T as TTS Entry
participant C as Cepstral Engine
participant A as Asterisk Dialplan
participant P as Caller
V->>T: Reference entry
T->>C: Send text and SSML
C->>A: Return spoken audio
A->>P: Play promptWhere to go from here
If you are planning a deployment, the next thing to understand is the licensing model, because Cepstral needs three separate licenses to integrate. That is covered in the three Cepstral licenses explained. To learn the markup that drives pronunciation, see controlling TTS pronunciation with SSML, and the full audio prompts and TTS guide sets the wider context. Rendered TTS audio lands in your store, described in the audio store overview.
TTS pays off most in an IVR (interactive voice response) or survey flow that speaks data back to the Lead. If you would rather skip the install and licensing work, a managed dialer can come with this configured. See our plans and pricing.
About VICIfast LLC
VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.
Citing this article
VICIfast Engineering. “What Cepstral is and why VICIdial TTS needs it”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/what-is-cepstral-vicidial
Have questions?
Related posts
You might be interested in
VICIfast newsletter
Liked this? Get the next one in your inbox.
We ship the kind of stuff you just read — concrete, numbers-first, no drip. One email when a new post goes live. Unsubscribe in one click.
Comments
No comments yet — be the first.