How to pick a voice for Text-to-Speech
The TTS Voice setting decides which Cepstral voice reads your text aloud. Here is what it controls and why the default is Allison-8kHz.
When you create a TTS (text to speech) entry in VICIdial, the TTS Voice setting decides which voice actually reads your text aloud. It is one field on the modify screen, but it depends on what Cepstral has installed on the server, so it is worth understanding before you assume a voice is available. This post covers what the setting controls and how to choose.
What the TTS Voice setting does
The TTS Voice field defines the voice used when the entry is generated. When the call flow reaches the entry, the text is sent to Cepstral, the chosen voice synthesizes it, and the resulting audio file plays to the caller. The default is Allison-8kHz, which is the voice you get if you never change the field.
Voices come from Cepstral
You cannot pick a voice that is not installed. Cepstral has to be installed and configured on the server, and each server that runs TTS needs at least one channel license, one voice, and the save-to-file license. The voice you reference in the setting must be one of the voices licensed and installed on that box.
How the voice setting resolves at call time
The voice is not baked in until the audio is generated. The entry stores the voice name, and Cepstral applies it when it renders the text. That means changing the TTS Voice on an entry changes how every future render of that entry sounds.
sequenceDiagram
participant V as VICIdial
participant S as TTS Voice setting
participant E as Cepstral
participant K as Caller
V->>S: Read voice name
S->>E: Use this voice
E->>E: Synthesize with voice
E->>V: Return audio file
V->>K: Play to callerChoosing a voice
Keep it simple. Use the same voice across related prompts so a caller does not hear the voice change mid call. Match the sample rate to your audio chain; the default Allison-8kHz pairs with the 8kHz audio common in telephony. The voice setting works alongside any Recording format (WAV/MP3) choices and the way an IVR (interactive voice response) sequences prompts, so consistency across the call flow matters more than the specific voice.
Test the voice with real text before you rely on it. The same voice can sound clean reading a short confirmation line and rough reading a long paragraph or a string of digits, so generate the entry, listen, and adjust. SSML pauses help break up dense text and give the voice a more natural rhythm, which matters more when the entry reads back lead details than when it plays a fixed greeting.
Remember that the voice is only a setting on the entry; the heavy lifting is Cepstral's. If a voice you expect does not appear or the audio sounds wrong, the problem is almost always on the server side rather than in the entry, so confirm the voice is installed and licensed on that box before you keep editing the field.
If you still need to create the entry the voice belongs to, start with adding a TTS entry, and for where synthesized speech sits among recorded prompts see the audio and TTS guide. Recorded prompts you upload yourself live in the audio store. To skip Cepstral licensing across servers, VICIfast hands you a wired dialer in under 40 seconds, so see the plans.
About VICIfast LLC
VICIfast LLC operates a managed VICIdial hosting + BYOI service for outbound and inbound call centers. We run the dialers, the carriers, the recordings pipeline, and the compliance plumbing so operators don’t have to.
Citing this article
VICIfast Engineering. “How to pick a voice for Text-to-Speech”. VICIfast LLC, June 27, 2026. Retrieved from https://vicifast.com/blog/vicidial-tts-voice-setting
Have questions?
Related posts
You might be interested in
VICIfast newsletter
Liked this? Get the next one in your inbox.
We ship the kind of stuff you just read — concrete, numbers-first, no drip. One email when a new post goes live. Unsubscribe in one click.
Comments
No comments yet — be the first.