Pick a Voice for VICIdial Text-to-Speech

How to pick a voice for Text-to-Speech

The TTS Voice setting decides which Cepstral voice reads your text aloud. Here is what it controls and why the default is Allison-8kHz.

VICIfast Support·June 27, 2026·3 min read

When you create a TTS (text to speech) entry in VICIdial, the TTS Voice setting decides which voice actually reads your text aloud. It is one field on the modify screen, but it depends on what Cepstral has installed on the server, so it is worth understanding before you assume a voice is available. This post covers what the setting controls and how to choose.

What the TTS Voice setting does

The TTS Voice field defines the voice used when the entry is generated. When the call flow reaches the entry, the text is sent to Cepstral, the chosen voice synthesizes it, and the resulting audio file plays to the caller. The default is Allison-8kHz, which is the voice you get if you never change the field.

Voices come from Cepstral

You cannot pick a voice that is not installed. Cepstral has to be installed and configured on the server, and each server that runs TTS needs at least one channel license, one voice, and the save-to-file license. The voice you reference in the setting must be one of the voices licensed and installed on that box.

Voices are per server. If you run TTS across more than one box, each box needs its own voice license, and the voice name you set must match what is installed there or the entry will not generate cleanly.

How the voice setting resolves at call time

The voice is not baked in until the audio is generated. The entry stores the voice name, and Cepstral applies it when it renders the text. That means changing the TTS Voice on an entry changes how every future render of that entry sounds.

sequenceDiagram
  participant V as VICIdial
  participant S as TTS Voice setting
  participant E as Cepstral
  participant K as Caller
  V->>S: Read voice name
  S->>E: Use this voice
  E->>E: Synthesize with voice
  E->>V: Return audio file
  V->>K: Play to caller

Choosing a voice

Keep it simple. Use the same voice across related prompts so a caller does not hear the voice change mid call. Match the sample rate to your audio chain; the default Allison-8kHz pairs with the 8kHz audio common in telephony. The voice setting works alongside any Recording format (WAV/MP3) choices and the way an IVR (interactive voice response) sequences prompts, so consistency across the call flow matters more than the specific voice.

Test the voice with real text before you rely on it. The same voice can sound clean reading a short confirmation line and rough reading a long paragraph or a string of digits, so generate the entry, listen, and adjust. SSML pauses help break up dense text and give the voice a more natural rhythm, which matters more when the entry reads back lead details than when it plays a fixed greeting.

Remember that the voice is only a setting on the entry; the heavy lifting is Cepstral's. If a voice you expect does not appear or the audio sounds wrong, the problem is almost always on the server side rather than in the entry, so confirm the voice is installed and licensed on that box before you keep editing the field.

If you still need to create the entry the voice belongs to, start with adding a TTS entry, and for where synthesized speech sits among recorded prompts see the audio and TTS guide. Recorded prompts you upload yourself live in the audio store. To skip Cepstral licensing across servers, VICIfast hands you a wired dialer in under 40 seconds, so see the plans.

About VICIfast LLC

VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.

About us Pricing Status page

Citing this article

VICIfast Engineering. “How to pick a voice for Text-to-Speech”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/vicidial-tts-voice-setting