What Cepstral Is and Why VICIdial TTS Needs It

What Cepstral is and why VICIdial TTS needs it

Cepstral is the text-to-speech engine that turns VICIdial prompt text into spoken audio, with SSML control over pronunciation, pitch, and volume.

VICIfast Support·June 27, 2026·2 min read

What Cepstral is and why VICIdial TTS needs it

When VICIdial speaks a prompt from text instead of a recording, something has to do the actual speaking. That something is Cepstral, the text-to-speech engine that integrates with VICIdial. If you have ever wondered why your dialer needs a separate piece of software just to read a name aloud, this is the answer.

What Cepstral does

Cepstral provides the text-to-speech layer that VICIdial calls on. Without it there is no engine to render your written prompts into audio. With it, the TTS (text to speech) features open up across the platform. Once installed, it lets you do a few things:

Any campaign-related audio prompt can use a TTS script instead of a recorded file
Speech scripts can pull from your default lead tables for personalized prompts
SSML markup controls pronunciation, volume, pitch, and rate

Why SSML matters

Cepstral uses SSML, the Speech Synthesis Markup Language, to take direction. Plain text gives the engine no guidance, so it makes its own choices about how to say a number or a code. Markup lets you override that, which is how you get an account number read digit by digit instead of as one large figure. For a Campaign that reads back reference numbers, this control is the difference between a clear prompt and a confusing one.

It lives on the dialer

Cepstral is installed on each dialer that will use the service, not on a single central server that the rest of the cluster shares. The exact behavior you get depends on how your system administrator set up the integration, but the rendering itself happens locally on the box placing the calls.

That local install is why TTS feels instant in a flow: the engine renders the text and hands it straight to the call path on the same machine. From there it plays like any other prompt.

How a TTS prompt actually plays

The chain is short and worth picturing. A campaign prompt references a TTS entry, the entry text plus SSML goes to Cepstral, Cepstral returns spoken audio, and the Asterisk Dialplan plays it to the caller.

sequenceDiagram
  participant V as VICIdial Campaign
  participant T as TTS Entry
  participant C as Cepstral Engine
  participant A as Asterisk Dialplan
  participant P as Caller
  V->>T: Reference entry
  T->>C: Send text and SSML
  C->>A: Return spoken audio
  A->>P: Play prompt

Cepstral is third-party software with its own licensing. You buy it from Cepstral, not as part of VICIdial itself, and the licenses you need depend on your channel count.

Where to go from here

If you are planning a deployment, the next thing to understand is the licensing model, because Cepstral needs three separate licenses to integrate. That is covered in the three Cepstral licenses explained. To learn the markup that drives pronunciation, see controlling TTS pronunciation with SSML, and the full audio prompts and TTS guide sets the wider context. Rendered TTS audio lands in your store, described in the audio store overview.

TTS pays off most in an IVR (interactive voice response) or survey flow that speaks data back to the Lead. If you would rather skip the install and licensing work, a managed dialer can come with this configured. See our plans and pricing.

About VICIfast LLC

VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.

About us Pricing Status page

Citing this article

VICIfast Engineering. “What Cepstral is and why VICIdial TTS needs it”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/what-is-cepstral-vicidial