The VICIdial troubleshooting playbook

A symptom-first guide that routes you from what's broken — dead dials, false answers, slow agent screens, drops on hold — to the exact VICIdial report that holds the answer.

Something is wrong on your dialer. Calls aren't going out, or agents keep getting kicked, or every other lead answers to dead air. The instinct is to open a log file over SSH and start scrolling. Don't. VICIdial already collected the evidence for you and sorted it into purpose-built reports — the trick is knowing which report answers which symptom. This playbook is a router: you bring the symptom, and it points you at the one screen that holds the answer. Bookmark it, and the next outage becomes a five-minute lookup instead of an afternoon of guesswork.

We'll work symptom-first, because that's how problems actually arrive. Nobody opens a ticket saying "the Dial Log Report shows clustered 503s." They say "my calls stopped." Each section below starts with a sentence a real operator would say, then walks you to the report that proves what happened. Every linked deep-dive explains how to read that one report in detail — this page is the map, those are the streets.

First, confirm the box is healthy

Before you blame a campaign or a carrier, rule out the machine itself. The Server Versions screen lists every server with its load average, free channels, disk space, system time, software version, and whether it's currently marked active. Two columns there catch more outages than anything else: disk space and system time. A full disk silently stops recordings and can wedge the database; a clock that has drifted out of sync makes calls hang up early and breaks anything time-stamped. If a server shows red here, fix that before you read another report — the rest of your diagnosis will be noise.

The Internal Process Logs screen sits right next to it and shows the back-end Perl processes — the ones that fill the hopper, dial, and reset lists — including how many times each launched, when, and how long it ran over the last seven days. If a critical process keeps restarting or never launched today, that's your root cause and you've found it in under a minute. Together these two screens answer the most basic question: is this a VICIdial problem, or an infrastructure problem?

On VICIfast every box is single-tenant on dedicated hardware, so a load spike is always your traffic, never a noisy neighbour. That makes the Server Versions load column trustworthy as a first signal — if it's pinned, your own campaign pacing or report load did it.

The decision flow

Here's the whole playbook on one screen. Pick the branch that matches what your users are complaining about, then jump to that section below.

flowchart TD
  A[Something is broken] --> B{What's the symptom?}
  B -->|Calls not dialing| C[Dial Log + Carrier Log]
  B -->|Dead air / false answers| D[SIP Event Report]
  B -->|Wrong caller IDs burned| E[Caller ID Log Report]
  B -->|Carrier dropping calls| F[Hangup Cause Report]
  B -->|Machines marked live| G[AMD Log Report]
  B -->|Agent screen slow or frozen| H[Latency + LAGGED Reports]
  B -->|Drops while on hold| I[Agent Parked Call Report]
  B -->|Integration broken| J[URL Log + API Log]
  C --> K[Asterisk Debug: SIP peers + registry]
  F --> K
  D --> K

"My calls aren't dialing"

Start at the Dial Log Report. It lists the calls placed by your servers, grouped by their SIP response code — the numeric reply your carrier sent for each dial attempt. For every call it shows the lead ID, server IP, call date, extension, channel, context, timeout, outbound caller ID, the Hangup cause code, the Asterisk uniqueid, and the SIP hangup reason. If your dials are failing, the response-code distribution tells you why in one glance: a wall of 503s means the carrier is refusing, a flood of timeouts means nothing is answering on the far end. Learn to read it in our walkthrough of the Dial Log Report.

The Carrier Log Report is the companion view. It shows every dial attempt and the response code for calls leaving your system, and you can download the raw logs for a deeper look or to forward to your Carrier when you open a ticket with them. If the Dial Log says "carrier refused," the Carrier Log Report is the evidence you hand over. When the failure is registration rather than per-call rejection — your SIP trunk isn't even connected — jump straight to the Asterisk Debug Page covered below.

A single SIP response code rarely tells the whole story. Cross-check the Dial Log against the Carrier Log before you escalate — a 503 in one and a clean attempt in the other usually means the problem is between your box and the carrier, not the carrier itself.

"Agents keep hitting dead air"

Dead air on a connected call is almost always False Answer Supervision: the carrier signalled that the call was answered when it actually wasn't, so VICIdial bridged a live agent to a ringing or dead line. The fix starts at the SIP Event Report, which lets you sort the raw SIP messages for each call by several criteria. The key move is sorting by ring time — calls flagged answered after an impossibly short ring are textbook Answer supervision failures. This report was literally built to hunt these down. Our guide to the SIP Event Report shows the exact sort, and how to diagnose false answer supervision walks the full case end to end.

One caveat: the SIP Event Report only has data if SIP Event Logging is enabled on the system. If yours is silent, turn that on first and let it gather a few hundred calls — then the patterns become obvious.

"We're burning the wrong caller IDs"

When answer rates fall off a cliff and you suspect your numbers are being flagged as spam, the Caller ID Log Report shows the breakdown by caller ID of calls from chosen campaigns and statuses across a date range. Its whole purpose is to show which CID (caller ID) values a campaign has actually been using, so you can rotate or retire numbers before they get burned. Watch the Log Second Diff field — it controls how many seconds apart two records can be and still count as the same call when VICIdial matches the main log against the dial log. Read the Caller ID Log Report guide before your next number-rotation review.

"The carrier is dropping my calls"

If calls connect and then die mid-conversation, or never connect for a reason the Dial Log doesn't make obvious, the Hangup Cause Report breaks down the carrier hangup causes for all outbound calls. You can restrict the view to specific causes so the noise drops away and the real pattern stands out — a sudden spike in one cause code almost always points at a single route or a carrier-side change. Pair it with the Hangup Cause Report walkthrough and our explainer on diagnosing carrier rejects to turn a cause code into an action. A climbing Drop rate with no agent-side cause is the classic signature to chase here.

"Machines are marked as live answers"

If agents complain they keep getting answering machines, or your live-answer numbers look too good to be true, audit Answering Machine Detection with the AMD Log Report. It shows the results of the built-in AMD (answering machine detection) function for calls placed from your dialers, so you can measure how often the detector decided "machine" versus "human" and spot it leaning the wrong way. Detection that's too aggressive hangs up on real people; detection that's too loose floods agents with voicemail. Tune it from the data, not from a hunch — start with the AMD Log Report.

"The agent screen is slow or freezing"

This is the most common agent complaint and it has a dedicated stack of reports. Start with the Agent Latency Report, which shows the web-connection Latency between an agent's browser and the server for anyone currently or recently logged in, with a chart of latency over the day. Climbing latency means the agent screen is waiting on the round trip — usually the agent's own network or workstation, sometimes the database server. Our Agent Latency Report guide and the broader diagnosing high agent latency piece show how to read the chart.

When an agent's session keeps cutting out entirely, look at the Latency Gaps Report. A latency gap is a missing stretch of latency log entries while an agent was supposed to be logged in — those gaps are one of the most reliable signals of a flaky session, and the report charts them across all agents for a day. See the Latency Gaps Report. For the related problem of agents being thrown out of the dialer mid-call, the Agent LAGGED Report tracks lag events — which dialer IP saw the lag and the full per-event detail of agent log ID, user, lead, campaign, status, and dial method — and the Agent LAGGED Report guide explains each column.

To see whether lag is an isolated incident or a creeping trend, the LAGGED Summary Report charts lag events on a timeline so clusters jump out visually — read the LAGGED Summary Report. And when you need to reconstruct exactly what one agent did — every button click and back-end AJAX call in their session — the Agent Debug Log Report is the microscope. It produces hundreds of entries per minute, so use it only after the higher-level reports point you at a single agent; the Agent Debug Log Report guide shows how to read that firehose. The usual root causes it confirms are the agent's network, their workstation, or the database server.

Work the agent reports top-down: Latency for the trend, Latency Gaps and LAGGED for the failure events, and the Agent Debug Log only as a last resort on the one session that's still unexplained. Opening the Debug Log first just buries you.

"Calls drop while customers are on hold"

Losing a parked customer is expensive, and the Agent Parked Call Report is the one place that counts it. Run it for a single day to see how many calls each agent parked, the average time on hold, and — the number that matters — how many calls dropped while the customer was on Call park. A high drop-on-hold count for one agent points at their handling; a high count across everyone points at hold music or queue configuration. Read the Agent Parked Call Report.

"My CRM integration stopped working"

When a dispo URL stops firing into your CRM, or a web-form button doesn't pass data, the URL Log Report records every URL the system requested — Dispo Call URLs, agent-screen web-form button clicks, the lot — for any time frame, and it can include the URL scripting fields so you can confirm exactly what value VICIdial sent. If the call into your system is malformed or never happened, this report shows it. Start with the URL Log Report.

For traffic coming the other way — a partner or middleware calling the VICIdial API (application programming interface) — the API Log Report lists the API entries for a date range and lets you filter by API user, agent user, function, and result. Filter by a failing result code and you'll see precisely which integration call is breaking. Our API Log Report guide and the broader diagnosing API integration failures piece cover the common breakages.

The low-level fallback: Asterisk Debug

Some problems live below the application layer. When a phone won't register, a SIP peer shows offline, or a trunk simply isn't connected, the Asterisk Debug Page is where you look. It shows the live Asterisk output of the SIP Peers and Registry, the IAX Peers and Registry, and the last 1000 lines of Asterisk CLI output, all without you needing a shell. If your SIP trunk registration is the problem the Dial Log couldn't explain, this page confirms it in seconds. Learn the layout in how to use the Asterisk Debug Page.

Two more debug screens are worth knowing. The Campaign Debug Page lets you pick an auto-dial campaign and watch its back-end logging, which is the right tool when one campaign behaves while another doesn't — it isolates pacing and Predictive dialing issues to a single campaign. And the Real-Time Monitoring Log Report records which manager monitored which agent and when, so when someone asks "who was barged into my call," the Real-Time Monitoring Log Report has the answer.

The human-factor reports

Not every "outage" is technical. Sometimes the real story is a conversation. The Agent-Manager Chat Log lets you search the internal chats between managers and agents and between agents themselves — useful when a productivity dip turns out to be a coaching issue or a confused handover rather than a broken dialer. It's a quick read in the Agent-Manager Chat Log guide. Keeping these reports in your rotation stops you from chasing a phantom carrier bug when the answer was sitting in a chat window.

Tracing one bad call end to end

The most powerful skill in troubleshooting isn't reading any single report — it's stitching them together for one specific call. Every report above shares the same join key: the call's Asterisk uniqueid. Grab it from the Dial Log, then follow that same ID into the SIP Event Report, the Hangup Cause Report, and the Carrier Log to see the full lifecycle of one call from dial to teardown. We wrote a dedicated walkthrough on how to trace one bad call end to end — it's the single most useful habit you can build.

And before you escalate to anyone — your carrier, your software vendor, or your host — run through what to check before calling support. A ticket that opens with a Asterisk uniqueid, a SIP response code, and a Carrier Log export gets resolved in minutes; a ticket that opens with "calls are broken" gets a day of back-and-forth.

Where this leaves you

The pattern is always the same: name the symptom, open the one report built for it, find the join key, and follow it. Server Versions and Internal Process Logs rule out the box. The Dial, Carrier, and Hangup Cause reports own the carrier side. The SIP Event and AMD reports own answer quality. The Latency, LAGGED, and Debug reports own the agent experience. The URL and API logs own your integrations. You don't need to memorise all of them — you need to know which door to open, and this map is that.

Of course, the fastest way to spend less time in these reports is to run on infrastructure that doesn't manufacture problems for you — properly synced clocks, headroom on disk and CPU, and a clean carrier path. That's exactly what we provision in under 40 seconds: a dedicated, single-tenant VICIdial box where the only variables left to debug are your campaigns and your carrier. See what's included on our pricing page.

If you're still running a hand-rolled box and spending your week inside the LAGGED report, that's a signal worth acting on. A managed, single-tenant dialer takes the infrastructure half of this playbook off your plate entirely. Compare the plans and spin one up — then the only reports you'll ever open are the ones about your own campaigns.