Skip to main content
POST
/
v1
/
calls
curl --request POST \
  --url https://api.topcalls.ai/v1/calls \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "phone_number": "+14155551234",
  "task": "Call to confirm John's appointment tomorrow at 3 PM"
}
EOF
{
  "call_id": "564d4fd4-03bc-400a-abe0-05540fbeff88",
  "provider_call_id": "64e9bf0e-7c2f-4443-a759-7eb1731cd583",
  "status": "queued"
}

Authorizations

Authorization
string
header
required

Use Authorization: Bearer tc_live_xxxxx

Body

application/json
phone_number
string
required

Phone number in E.164 format (e.g., +14155551234)

  • Must start with +
  • Country code must be 1-9 (not 0)
  • Total length: 1-15 digits after the +
Example:

"+14155551234"

task
string
required

Simple prompt describing what the AI should do. Use this OR instructions (not both).

Minimum string length: 10
Example:

"Call to confirm John's appointment tomorrow at 3 PM"

from_number
string

Caller ID in E.164 format (optional, falls back to FROM_NUMBER env var)

  • Must be in E.164 format if provided
  • Either provide this or set FROM_NUMBER environment variable
Example:

"+18005551234"

first_sentence
string

The AI's opening line

Minimum string length: 1
Example:

"Hi, this is Sarah from TopView Dental calling about your appointment."

instructions
string

Full system instructions for the AI. Use this OR task (not both).

Minimum string length: 10
Example:

"You are Sarah, a friendly appointment coordinator..."

mode
enum<string>
default:realtime

Conversation mode

  • realtime: OpenAI Realtime API (speech-to-speech, low latency)
  • legacy: Separate STT → LLM → TTS pipeline (custom voices, voice cloning)
Available options:
realtime,
legacy
voice
string
default:alloy

Voice to use for AI responses.

Realtime mode:

  • OpenAI voices: alloy, echo, shimmer, ash, ballad, coral, sage, verse

Legacy mode (ElevenLabs):

  • Voice names: rachel, domi, bella, antoni, elli, josh, arnold, sam, adam, nicole, matilda
  • Or voice_id directly: 21m00Tcm4TlvDq8ikWAM (24-char alphanumeric)
  • Custom/cloned voices: Use the voice_id from your ElevenLabs account

Legacy mode (Deepgram):

  • Aura-2 voices: aura-2-thalia-en, aura-2-orion-en, etc.
Example:

"alloy"

model
string

AI model to use for the call. Default is selected based on mode.

See GET /v1/models for the complete list of available models and their capabilities.

Defaults are automatically selected per mode if not specified.

Example:

"gemini-2.5-flash"

temperature
number
default:0.7

LLM creativity/temperature (0-1). Higher values = more creative responses.

  • Most models: Full range 0-1 supported
  • Some reasoning models only support default temperature
Required range: 0 <= x <= 1
stt_provider
enum<string>
default:deepgram

STT provider (legacy mode only)

  • deepgram: Deepgram (default, 36+ languages)
  • gladia: Gladia (100+ languages, automatic language detection, multilingual support)
  • Provider name must match telephony provider configuration
  • Use stt_language: "multi" for automatic multilingual detection (Gladia only)
  • Only used when mode=legacy
Available options:
deepgram,
gladia
Example:

"deepgram"

stt_model
string
default:nova-3

STT model (legacy mode only). See GET /v1/models for complete list of available STT models and their capabilities. Only used when mode=legacy.

Minimum string length: 1
Example:

"nova-3"

stt_language
string

STT language/dialect code (legacy mode only)

  • Examples: en-US, en-GB, en-AU, es-ES, nl-BE
  • Use multi for automatic multilingual detection (supported by Gladia only)
  • Controls speech recognition accent/dialect
  • Only used when mode=legacy
  • For restricted multi-language detection, use stt_languages array instead
Minimum string length: 2
Example:

"en-GB"

stt_languages
string[]

Array of language codes for restricted multi-language detection (Gladia only).

When multiple languages are provided:

  • Enables code_switching mode automatically
  • Restricts detection to ONLY these specified languages
  • Dramatically improves accuracy for short phrases

This is preferred over stt_language: "multi" when you know which languages your callers will speak, as it narrows the detection space from 100+ languages to just the ones you specify.

Examples:

  • ["en", "ro"] - Detect English and Romanian only
  • ["en", "es", "fr"] - Detect English, Spanish, and French
  • Use ISO 639-1 language codes (e.g., en, es, fr, de, ro)

Only used when mode=legacy and stt_provider=gladia.

Required array length: 1 - 10 elements
Minimum string length: 2
Example:
["en", "ro"]
stt_vocabulary
(string | object)[]

Custom vocabulary for STT (Gladia only). Boost recognition of domain-specific words and phrases in real time.

Formats supported:

  • Simple strings: ["Capex", "TopCalls"]
  • Objects with language: [{"value": "Capex", "language": "en"}]
  • Mixed: ["Capex", {"value": "مرحبا", "language": "ar"}]

Use cases:

  • Company/product names
  • Industry-specific terminology
  • Names that may be mispronounced
  • Technical terms

Only used when mode=legacy and stt_provider=gladia.

Required array length: 1 - 100 elements
Minimum string length: 1
Example:
[
"Capex",
{ "value": "TopCalls", "language": "en" }
]
stt_endpoint_sensitivity
number
default:0.01

STT endpoint sensitivity (Gladia only). Controls how long to wait after silence before considering speech complete.

  • Range: 0.01 - 2.0 seconds
  • Default: 0.01 seconds (per Gladia recommendation for telephony audio)
  • Lower values (0.01-0.1): Recommended for telephony-quality audio, accented speech
  • Higher values (0.8-2.0): Better for thoughtful speakers, elderly users

Only used when mode=legacy and stt_provider=gladia.

Required range: 0.01 <= x <= 2
Example:

0.01

stt_interrupt_sensitivity
number
default:0.8

STT interrupt/speech detection sensitivity (Gladia only). Controls the speech detection threshold for distinguishing speech from noise.

  • Range: 0.0 - 1.0
  • Default: 0.8 (per Gladia recommendation for telephony audio)
  • Higher values (0.7-0.9): Recommended for telephony audio, background noise
  • Lower values (0.0-0.4): More sensitive to speech, may pick up more noise

Only used when mode=legacy and stt_provider=gladia.

Required range: 0 <= x <= 1
Example:

0.8

transcript_correction_vocabulary
(string | object)[]

Transcript correction vocabulary for LLM-based STT error correction (legacy mode only). Provides domain-specific terms that STT often mishears, allowing the LLM to use context to mentally correct transcription errors.

Formats supported:

  • Simple strings: ["Weaviate", "Kubernetes", "TopCalls"]
  • Objects with sounds_like hints:
    [
    { "correct": "Weaviate", "sounds_like": ["we activate", "web VT"] },
    { "correct": "NVIDIA", "sounds_like": ["in video"], "context": "hardware" }
    ]
  • Mixed: ["TopCalls", { "correct": "Kubernetes", "sounds_like": ["cube net ease"] }]

How it works:

  • The vocabulary is added to the LLM system prompt
  • When STT mishears a domain term, the LLM uses context to interpret correctly
  • No additional latency (processed in the main LLM call)
  • LLM responds naturally without mentioning the correction

Use cases:

  • Company/product names (Weaviate, Kubernetes, NVIDIA)
  • Industry-specific terminology (medical, legal, financial terms)
  • Technical terms that sound like common words
  • Names that may be mispronounced

Only used when mode=legacy.

Required array length: 1 - 100 elements
Minimum string length: 1
Example:
[
"TopCalls",
{
"correct": "Weaviate",
"sounds_like": ["we activate", "web VT"]
},
{
"correct": "Kubernetes",
"sounds_like": ["cube net ease", "cooper nettie"],
"context": "technology"
}
]
tts_provider
enum<string>
default:deepgram

TTS provider (legacy mode only). See GET /v1/voices/builtin for available voices per provider. Only used when mode=legacy.

Available options:
deepgram,
elevenlabs
Example:

"deepgram"

tts_model
string

TTS model (legacy mode only). See GET /v1/models for complete list of available TTS models. Only used when mode=legacy.

Minimum string length: 1
Example:

"eleven_flash_v2_5"

tts_stability
number

ElevenLabs voice stability (legacy mode, tts_provider=elevenlabs only). Controls the consistency of the voice output.

  • Lower values (0): More variable, emotional, expressive
  • Higher values (1): More consistent, stable, less expressive
  • Default: 0.75 (optimized for voice agents)
  • Only used when mode=legacy and tts_provider=elevenlabs
Required range: 0 <= x <= 1
Example:

0.75

tts_similarity_boost
number

ElevenLabs voice similarity boost (legacy mode, tts_provider=elevenlabs only). Controls how closely the generated voice matches the original.

  • Lower values (0): Less similar to original voice
  • Higher values (1): More similar to original voice
  • Default: 0.5 (balanced for voice agents)
  • Only used when mode=legacy and tts_provider=elevenlabs
Required range: 0 <= x <= 1
Example:

0.5

tts_speed
number

ElevenLabs speech speed (legacy mode, tts_provider=elevenlabs only). Controls the rate of speech.

  • Lower values (0.7): Slower speech
  • Higher values (1.2): Faster speech
  • Default: 0.78 (slightly slower for clarity)
  • Only used when mode=legacy and tts_provider=elevenlabs
Required range: 0.7 <= x <= 1.2
Example:

0.78

filler_enabled
boolean
default:false

Enable filler acknowledgments (legacy mode only). When enabled, the AI will generate brief acknowledgments (e.g., "Got it...", "Sure...") before the main response to reduce perceived latency.

  • false (default): No filler - AI responds directly
  • true: AI generates contextual filler before main response

Only used when mode=legacy.

Example:

false

block_interruption
boolean
default:false

Block interruption mode (legacy mode only). When enabled, the AI continues speaking even if the user talks over it.

  • User speech during TTS is buffered (not processed immediately)
  • When TTS ends, buffered speech is merged and checked:
    • If ≥5 words: processed through LLM (single call)
    • If <5 words: discarded (fillers like "uh huh", "okay")

Use cases:

  • Delivering critical information that shouldn't be interrupted
  • Users who provide active listening cues during AI speech
  • Noisy environments with background speech/noise

Only used when mode=legacy.

Example:

false

max_duration
number
default:5

Maximum call duration in minutes (enforced by telephony provider)

Required range: 1 <= x <= 60
background_audio
enum<string>
default:office

Background audio preset to play during the call.

  • office: Office ambiance (default) - subtle office sounds
  • none: No background audio

Background audio plays continuously under the conversation and helps create a professional atmosphere.

Available options:
office,
none
Example:

"office"

background_audio_gain
enum<string>
default:medium

Volume level for background audio relative to speech.

  • low: Subtle (-10 dB) - quieter background
  • medium: Balanced (-4 dB) - noticeable but balanced (default)
  • high: Full volume (0 dB) - background at same level as speech

Only used when background_audio is not none.

Available options:
low,
medium,
high
Example:

"medium"

webhook_url
string<uri>

Webhook URL to receive call completion/failure notifications. Webhook is sent after call finishes (includes recording_url and call_summary when available).

Example:

"https://your-app.com/webhooks/call-complete"

analysis_schema
object

Schema for post-call AI analysis. Defines what information to extract from the transcript. After the call, AI analyzes the transcript and extracts structured data matching this schema. Results are included in the webhook payload under the analysis field.

Supported types:

  • boolean: true/false values (e.g., "converted", "appointment_confirmed")
  • string/text: Free-form text (e.g., "objections", "questions")
  • number: Numeric values (e.g., "rating", "call_count")
  • date: Date/time in ISO 8601 format (e.g., "appointment_time")

Simple format: Just specify the type

{ "converted": "boolean", "objections": "string" }

Rich format: Include description for better AI understanding

{
"converted": {
"type": "boolean",
"description": "Whether the lead agreed to schedule an appointment"
},
"appointment_time": {
"type": "date",
"description": "The scheduled appointment date/time if booked"
}
}
Example:
{
"converted": {
"type": "boolean",
"description": "Whether the lead agreed to schedule an appointment or expressed buying interest"
},
"objections": {
"type": "string",
"description": "Any concerns or objections the lead raised during the call"
},
"appointment_time": {
"type": "date",
"description": "The scheduled appointment date and time if one was booked"
}
}
metadata
object

Custom metadata to include in webhook payload. System fields (task, voice, model, etc.) are filtered out automatically.

Example:
{
"patient_id": "pat_123",
"source": "reminder_system"
}

Response

Call created successfully

call_id
string<uuid>

Call UUID

Example:

"564d4fd4-03bc-400a-abe0-05540fbeff88"

provider_call_id
string | null

Provider call ID (may be null if call creation failed)

Example:

"64e9bf0e-7c2f-4443-a759-7eb1731cd583"

status
enum<string>

Current call status

Available options:
queued,
pending,
in_progress,
completed,
failed,
cancelled
Example:

"queued"