Skip to main content
POST
/
v1
/
calls
curl --request POST \
  --url https://api.topcalls.ai/v1/calls \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data @- <<EOF
{
  "phone_number": "+14155551234",
  "task": "Call to confirm John's appointment tomorrow at 3 PM"
}
EOF
{
  "call_id": "564d4fd4-03bc-400a-abe0-05540fbeff88",
  "provider_call_id": "64e9bf0e-7c2f-4443-a759-7eb1731cd583",
  "status": "queued"
}

Documentation Index

Fetch the complete documentation index at: https://docs.topcalls.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
required

Use Authorization: Bearer tc_live_xxxxx

Headers

Idempotency-Key
string

Optional client-supplied idempotency key. When present, the gateway caches the response for 24 hours and returns the same response on retried requests with the same key (account-scoped). Safe for retries on network blips. Format: 8-255 ASCII characters from [A-Za-z0-9_-].

Required string length: 8 - 255
Pattern: ^[A-Za-z0-9_-]+$
Example:

"a1b2c3d4-e5f6-7890-abcd-ef0123456789"

Body

application/json
phone_number
string
required

Phone number in E.164 format (e.g., +14155551234)

  • Must start with +
  • Country code must be 1-9 (not 0)
  • Total length: 1-15 digits after the +
  • Also validated as dialable by libphonenumber-js — country codes that don't exist are rejected
Pattern: ^\+[1-9]\d{1,14}$
Example:

"+14155551234"

task
string
required

Simple prompt describing what the AI should do. Use this OR instructions (not both).

Minimum string length: 10
Example:

"Call to confirm John's appointment tomorrow at 3 PM"

from_number
string

Caller ID in E.164 format (optional, falls back to FROM_NUMBER env var)

  • Must be in E.164 format if provided
  • Either provide this or set FROM_NUMBER environment variable
Pattern: ^\+[1-9]\d{1,14}$
Example:

"+18005551234"

first_sentence
string

The AI's opening line

Minimum string length: 1
Example:

"Hi, this is Rachel from TopView Dental calling about your appointment."

instructions
string

Full system instructions for the AI. Use this OR task (not both).

Minimum string length: 10
Example:

"You are Rachel, a friendly appointment coordinator..."

mode
enum<string>
default:realtime

Conversation mode

  • realtime: Speech-to-speech mode (low latency)
  • legacy: Separate STT → LLM → TTS pipeline (custom voices, voice cloning)
Available options:
realtime,
legacy
voice
string
default:alloy

Voice to use for AI responses.

Realtime mode:

  • Available voices: alloy, echo, shimmer, ash, ballad, coral, sage, verse

Legacy mode (custom voices):

  • Voice names: rachel, domi, bella, antoni, elli, josh, arnold, sam, adam, nicole, matilda
  • Or voice_id directly: 21m00Tcm4TlvDq8ikWAM (24-char alphanumeric)
  • Custom/cloned voices: Use the voice_id from your account

Legacy mode (fast voices):

  • Aura-2 voices: aura-2-thalia-en, aura-2-orion-en, etc.
Example:

"alloy"

model
string

AI model to use for the call. Default is selected based on mode.

See GET /v1/models for the complete list of available models and their capabilities.

Defaults are automatically selected per mode if not specified.

temperature
number
default:0.7

LLM creativity/temperature (0-1). Higher values = more creative responses.

  • Most models: Full range 0-1 supported
  • Some reasoning models only support default temperature
Required range: 0 <= x <= 1
stt_provider
enum<string>
default:deepgram

STT provider (legacy mode only)

  • deepgram: Default provider (36+ languages)
  • gladia: Multi-language provider (100+ languages, automatic detection)
  • speechmatics: High-accuracy multilingual provider with native Arabic+English code-switching (use stt_language: "ar_en")
  • soniox: Real-time multilingual provider with native code-switching via stt_languages array (e.g. ["en", "ar"])
  • Provider name must match telephony provider configuration
  • Use stt_language: "multi" for automatic multilingual detection (gladia only)
  • Only used when mode=legacy
Available options:
deepgram,
gladia,
speechmatics,
soniox
Example:

"deepgram"

stt_model
string
default:nova-3

STT model (legacy mode only). See GET /v1/models for complete list of available STT models and their capabilities. Only used when mode=legacy.

Minimum string length: 1
Example:

"nova-3"

stt_language
string

STT language code (legacy mode only). See GET /v1/config for the exact list per model.

  • For deepgram: accepts ISO 639-1 base codes plus provider-supported regional variants (e.g. en, en-US, en-GB, en-IN, zh-CN, pt-BR)
  • For gladia: accepts only ISO 639-1 base codes (e.g. en, ar, hi). Regional variants are not supported - omit stt_languages (or send an empty array) for automatic multilingual detection
  • multi is a deepgram-only sentinel for multilingual code-switching (supported by nova-3, nova-2, flux-general-multi)
  • Only used when mode=legacy
  • For restricted multi-language detection (gladia), use stt_languages array instead
Minimum string length: 2
Example:

"en-US"

stt_languages
string[]

Array of language codes for restricted multi-language detection (gladia only).

When multiple languages are provided:

  • Enables code_switching mode automatically
  • Restricts detection to ONLY these specified languages
  • Dramatically improves accuracy for short phrases

Narrows the detection space from 99 languages to just the ones you specify. Omit this field (or send an empty array) for unrestricted multilingual auto-detection.

Must be ISO 639-1 base codes only - gladia does not accept regional variants like en-US or zh-CN. See GET /v1/config for the full list.

Examples:

  • ["en", "ro"] - Detect English and Romanian only
  • ["en", "es", "fr"] - Detect English, Spanish, and French
  • ["en", "ar", "hi"] - Detect English, Arabic, and Hindi

Only used when mode=legacy and stt_provider=gladia.

Required array length: 1 - 10 elements
Minimum string length: 2
Example:
["en", "ro"]
stt_vocabulary
(string | object)[]

Custom vocabulary for STT (multi-language provider only). Boost recognition of domain-specific words and phrases in real time.

Formats supported:

  • Simple strings: ["Capex", "TopCalls"]
  • Objects with language: [{"value": "Capex", "language": "en"}]
  • Mixed: ["Capex", {"value": "مرحبا", "language": "ar"}]

Use cases:

  • Company/product names
  • Industry-specific terminology
  • Names that may be mispronounced
  • Technical terms

Only used in legacy mode with the multi-language STT provider.

Required array length: 1 - 100 elements
Minimum string length: 1
Example:
[
"Capex",
{ "value": "TopCalls", "language": "en" }
]
stt_endpoint_sensitivity
number
default:0.01

STT endpoint sensitivity (seconds to wait after silence before finalizing a transcript).

Effective range is provider-dependent:

  • gladia: 0.01 - 2.0 seconds (default 0.01; 0.01-0.1 recommended for telephony)
  • soniox: 0.5 - 3.0 seconds (default 2.0)

Lower values = snappier turn-ends; higher values = more patience for slow speakers. Values outside the active provider effective range may be clamped or rejected by the upstream provider. Only used in legacy mode and only honoured by providers that expose this knob.

Required range: 0.01 <= x <= 3
Example:

0.01

stt_interrupt_sensitivity
number
default:0.8

STT interrupt/speech detection sensitivity (multi-language provider only). Controls the speech detection threshold for distinguishing speech from noise.

  • Range: 0.0 - 1.0
  • Default: 0.8 (recommended for telephony audio)
  • Higher values (0.7-0.9): Recommended for telephony audio, background noise
  • Lower values (0.0-0.4): More sensitive to speech, may pick up more noise

Only used in legacy mode with the multi-language STT provider.

Required range: 0 <= x <= 1
Example:

0.8

stt_max_delay
number
default:1.5

Max wait in seconds before finalizing a transcript (speechmatics only). Lower = snappier turn-ends, higher = more patience for slow speakers.

  • Range: 0.7 - 4.0 seconds
  • Default: 1.5 seconds
  • Fixed platform-wide for other providers.

Only used in legacy mode with stt_provider=speechmatics.

Required range: 0.7 <= x <= 4
Example:

1.5

transcript_correction_vocabulary
(string | object)[]

Transcript correction vocabulary for LLM-based STT error correction (legacy mode only). Provides domain-specific terms that STT often mishears, allowing the LLM to use context to mentally correct transcription errors.

Formats supported:

  • Simple strings: ["Weaviate", "Kubernetes", "TopCalls"]
  • Objects with sounds_like hints:
    [
    { "correct": "Weaviate", "sounds_like": ["we activate", "web VT"] },
    { "correct": "NVIDIA", "sounds_like": ["in video"], "context": "hardware" }
    ]
  • Mixed: ["TopCalls", { "correct": "Kubernetes", "sounds_like": ["cube net ease"] }]

How it works:

  • The vocabulary is added to the LLM system prompt
  • When STT mishears a domain term, the LLM uses context to interpret correctly
  • No additional latency (processed in the main LLM call)
  • LLM responds naturally without mentioning the correction

Use cases:

  • Company/product names (Weaviate, Kubernetes, NVIDIA)
  • Industry-specific terminology (medical, legal, financial terms)
  • Technical terms that sound like common words
  • Names that may be mispronounced

Only used when mode=legacy.

Required array length: 1 - 100 elements
Minimum string length: 1
Example:
[
"TopCalls",
{
"correct": "Weaviate",
"sounds_like": ["we activate", "web VT"]
},
{
"correct": "Kubernetes",
"sounds_like": ["cube net ease", "cooper nettie"],
"context": "technology"
}
]
tts_provider
enum<string>
default:deepgram

TTS provider (legacy mode only). See GET /v1/voices/builtin for available voices per provider. Only used when mode=legacy.

Available options:
deepgram,
elevenlabs
Example:

"deepgram"

tts_model
string

TTS model (legacy mode only). See GET /v1/models for complete list of available TTS models. Only used when mode=legacy.

Minimum string length: 1
Example:

"eleven_flash_v2_5"

tts_stability
number

Voice stability (legacy mode). Controls the consistency of the voice output.

  • Lower values (0): More variable, emotional, expressive
  • Higher values (1): More consistent, stable, less expressive
  • Default: 0.75 (optimized for voice agents)
  • Only used in legacy mode with the corresponding TTS provider
Required range: 0 <= x <= 1
Example:

0.75

tts_similarity_boost
number

Voice similarity boost (legacy mode). Controls how closely the generated voice matches the original.

  • Lower values (0): Less similar to original voice
  • Higher values (1): More similar to original voice
  • Default: 0.5 (balanced for voice agents)
  • Only used in legacy mode with the corresponding TTS provider
Required range: 0 <= x <= 1
Example:

0.5

tts_speed
number

Speech speed (legacy mode). Controls the rate of speech.

  • Lower values (0.7): Slower speech
  • Higher values (1.2): Faster speech
  • Default: 0.78 (slightly slower for clarity)
  • Only used in legacy mode with the corresponding TTS provider
Required range: 0.7 <= x <= 1.2
Example:

0.78

filler_enabled
boolean
default:false

Enable filler acknowledgments (legacy mode only). When enabled, the AI will generate brief acknowledgments (e.g., "Got it...", "Sure...") before the main response to reduce perceived latency.

  • false (default): No filler - AI responds directly
  • true: AI generates contextual filler before main response

Only used when mode=legacy.

Example:

false

block_interruption
boolean
default:false

Block interruption mode (legacy mode only). When enabled, the AI continues speaking even if the user talks over it.

  • User speech during TTS is buffered (not processed immediately)
  • When TTS ends, buffered speech is merged and checked:
    • If ≥5 words: processed through LLM (single call)
    • If <5 words: discarded (fillers like "uh huh", "okay")

Use cases:

  • Delivering critical information that shouldn't be interrupted
  • Users who provide active listening cues during AI speech
  • Noisy environments with background speech/noise

Only used when mode=legacy.

Example:

false

max_duration
number
default:5

Maximum call duration in minutes (enforced by telephony provider)

Required range: 1 <= x <= 60
background_audio
enum<string>
default:office

Background audio preset to play during the call.

  • office: Office ambiance (default) - subtle office sounds
  • none: No background audio

Background audio plays continuously under the conversation and helps create a professional atmosphere.

Available options:
office,
none
Example:

"office"

background_audio_gain
enum<string>
default:medium

Volume level for background audio relative to speech.

  • low: Subtle (-10 dB) - quieter background
  • medium: Balanced (-4 dB) - noticeable but balanced (default)
  • high: Full volume (0 dB) - background at same level as speech

Only used when background_audio is not none.

Available options:
low,
medium,
high
Example:

"medium"

webhook_url
string<uri>

Webhook URL to receive call completion/failure notifications. Webhook is sent after call finishes (includes recording_url and call_summary when available).

Example:

"https://your-app.com/webhooks/call-complete"

analysis_schema
object

Schema for post-call AI analysis. Defines what information to extract from the transcript. After the call, AI analyzes the transcript and extracts structured data matching this schema. Results are included in the webhook payload under the analysis field.

Supported types:

  • boolean: true/false values (e.g., "converted", "appointment_confirmed")
  • string/text: Free-form text (e.g., "objections", "questions")
  • number: Numeric values (e.g., "rating", "call_count")
  • date: Date/time in ISO 8601 format (e.g., "appointment_time")

Simple format: Just specify the type

{ "converted": "boolean", "objections": "string" }

Rich format: Include description for better AI understanding

{
"converted": {
"type": "boolean",
"description": "Whether the lead agreed to schedule an appointment"
},
"appointment_time": {
"type": "date",
"description": "The scheduled appointment date/time if booked"
}
}
Example:
{
"converted": {
"type": "boolean",
"description": "Whether the lead agreed to schedule an appointment or expressed buying interest"
},
"objections": {
"type": "string",
"description": "Any concerns or objections the lead raised during the call"
},
"appointment_time": {
"type": "date",
"description": "The scheduled appointment date and time if one was booked"
}
}
mcp_url
string<uri>

Optional MCP server URL for remote tool dispatch (Activepieces). When set, the gateway opens an SSE MCP client at call start and merges the remote tools into the LLM tool list.

Example:

"https://integrations.example.com/mcp/abc123"

mcp_token
string

Bearer token for the MCP server. Redacted from logs by suffix rule (any field ending in _token/_secret/_key/_password).

mcp_tool_allowlist
string[]

Names of MCP tools (as returned by listTools()) that the AI is allowed to invoke during this call. When absent or empty, zero remote MCP tools are attached — only platform tools like end_call remain. Explicit opt-in to prevent prompt bloat from auto-attaching every flow in the connected workspace.

Example:
["book_callback", "send_sms_confirmation"]
tool_call_timeout_ms
integer
default:3000

Per-call timeout for MCP tool invocation in milliseconds. On timeout, the gateway feeds {error: "tool_timeout"} into the second LLM hop so the model can recover conversationally.

Required range: 500 <= x <= 10000
metadata
object

Custom metadata to include in webhook payload. System fields (task, voice, model, etc.) are filtered out automatically.

Example:
{
"patient_id": "pat_123",
"source": "reminder_system"
}
lead_id
string<uuid>

Optional lead reference. When provided, the gateway loads the lead record (name, email, notes, status, plus any custom fields stored on the lead) and exposes them to the AI via lead_context. Caller-supplied lead_context takes precedence on key collision. Returns 404 if the lead does not exist in the calling account.

campaign_id
string<uuid>

Optional campaign reference. When provided, the gateway loads the campaign's attached knowledge base and includes it in the call's runtime context. Returns 404 if the campaign does not exist in the calling account.

When used as a campaign execution path (without phone_number), campaign_id, lead_id, idempotency_key, and attempt_number are all required.

attempt_number
integer

Attempt number for campaign execution mode (required when campaign_id is provided without phone_number).

Required range: x >= 1
idempotency_key
string

Idempotency key for campaign execution mode (required when campaign_id is provided without phone_number).

Minimum string length: 8
scheduled_at
string<date-time>

Optional scheduled time for campaign execution mode.

lead_context
object

Free-form key/value context surfaced to the AI during the call. When lead_id is set, the gateway auto-builds a base lead_context from the lead record; any keys passed here shallow-merge on top of the auto-built base and win on collision.

Response

Call created successfully

call_id
string<uuid>

Call UUID

Example:

"564d4fd4-03bc-400a-abe0-05540fbeff88"

provider_call_id
string | null

Provider call ID (may be null if call creation failed)

Example:

"64e9bf0e-7c2f-4443-a759-7eb1731cd583"

status
enum<string>

Current call status

Available options:
queued,
pending,
in_progress,
completed,
failed,
cancelled
Example:

"queued"