Skip to main content

Voice Agents JS Example (Gemini Attended Transfer)

Voice Agents JS Example - Gemini Attended Transfer

This example demonstrates how to use Google Gemini as an intelligent screening assistant for attended transfers. Instead of manually calling colleagues to check availability, any employee can use the AI Voice Agents to automatically screen and connect calls.

The Problem

When someone answers a call and the caller asks for a specific person, they typically must:

  1. Put the caller on hold
  2. Call the destination person to check if they're available
  3. Relay information back and forth
  4. Manually connect the call or take a message

This is time-consuming and inefficient for everyone involved.

The Solution: One-Button Screening

This script enables attended transfer with AI screening. With a programmable button on your desk phone:

  1. Caller asks for someone: "I need to speak with Bob"
  2. Press the programmed button: Automatically puts caller on hold and dials the Voice Agents
  3. Tell the AI: "Find Bob" or "Check if Bob is available"
  4. AI handles screening: Calls Bob, asks if he wants to accept, gets response
  5. Automatic result: Call connects to Bob, or you get notified he's busy

Desk Phone Button Setup

Most VoIP desk phones (Yealink, Polycom, Snom, etc.) support programmable buttons that can:

  • Put the active call on hold
  • Dial the Voice Agents extension (e.g., 5000)

Example button configuration:

Button Label: "Screen Call"
Action: Transfer
Destination: 5000

When pressed, your current call goes on hold and the Voice Agents answers immediately, ready for your instruction.

How It Works:

Step 1: Incoming Call

  • Employee answers a call
  • Caller: "I'd like to speak with Bob"
  • Employee presses the "Screen Call" button (call automatically goes on hold)

Step 2: AI Voice Agents Activation

  • Button press dials the Voice Agents extension (e.g., 5000)
  • AI: "How may I help you today?"
  • Employee: "Find Bob" or "Transfer to Bob"
  • AI recognizes "Bob" and looks up extension 503

Step 3: AI Screens the Destination

  • AI automatically calls Bob at extension 503
  • Bob answers the AI call
  • AI: "There is a call for you from [Caller Name]. Would you like to accept it?"
  • Bob responds naturally: "Yes, put them through" OR "No, I'm busy right now"

Step 4: Automatic Connection

  • If Bob accepts: The original caller is automatically connected to Bob
  • If Bob rejects: Employee hears "Attendant transfer failed" and can inform caller: "I'm sorry, Bob is unavailable. Can I take a message?"

Benefits:

  • One-Button Operation: Single button press handles everything
  • Time Savings: No manual dialing or waiting for colleagues to answer
  • Professional Screening: AI provides context (caller name) to help with decisions
  • Better Experience: Employees can decline calls when busy or in meetings
  • Reduced Hold Time: Faster resolution for callers
  • Universal: Works for receptionists, assistants, team members, or anyone with a desk phone

Alternative Use Case: Direct Caller Interaction

The same script also works when callers interact directly with the AI Voice Agents (without any employee answering first):

Direct Caller Workflow:

  1. Caller dials the company and reaches the AI Voice Agents directly
  2. AI: "How may I help you today?"
  3. Caller: "I need to speak with Bob"
  4. AI calls Bob with attended transfer (same screening process)
  5. If accepted → caller connected; if rejected → AI informs caller

This mode is useful for:

  • After-hours auto-attendant
  • Department direct lines
  • Self-service call routing
  • Reducing front-desk call volume

Gemini Setup

Please refer to the documentation here for Gemini setup, including obtaining your API key and configuring the necessary permissions.


Technical Overview

This script uses a dual-mode approach where the same Voice Agents behaves differently based on the ivraction parameter:

Mode 1: Reception/Concierge Mode (initial call)

  • Receptionist or caller speaks to AI
  • AI recognizes names/extensions from the directory map
  • When instructed to "find" or "transfer to" someone, AI initiates screening call

Mode 2: Attendant Mode (destination call)

  • The employee receives a call with ivraction: 'attendant'
  • AI announces the caller and asks if they want to accept
  • Employee responds naturally: "Yes", "Sure", "No", "I'm busy", etc.
  • AI calls call_accept() or call_reject() based on response
  • Result is sent back to the reception/concierge call

Key Technical Components

  • call.dial(): Initiates the outbound call to the employee with custom parameters
  • ivraction: 'attendant': Signals that this call should use attendant mode instructions
  • cobj: Passes the original call object to be connected after acceptance
  • call.find(): Retrieves active calls to get the original caller's information
  • call.transfer({ action: 'accept/reject' }): Completes or cancels the attended transfer
  • HTTP callback with body.type == 'att_transfer': Receives the result back in the concierge Voice Agents
  • client_content messaging: Sends the greeting prompt to Gemini after connection
  • Audio streaming: Bidirectional PCM audio between caller and Gemini

Directory Mapping

In the script, you configure the name-to-extension mapping:

"Here is a map of words to numbers: { Sales: 502, Support: 501, Marketing: 504, Bob: 503 }"

When the receptionist or caller says "Bob", the AI automatically resolves it to extension 503.


Call Flow Diagrams

Desk Phone Button Workflow:

[Incoming Call] → [Employee Answers]

[Caller: "I need Bob"]

[Employee Presses "Screen Call" Button]

[Call Automatically Put on Hold + Voice Agents Dialed]

[AI Voice Agents: "How may I help you today?"]

[Employee: "Find Bob" or "Transfer to Bob"]

[AI recognizes "Bob" = 503]

[AI calls Bob at 503]

[AI to Bob: "Call from John Smith. Accept?"]

[Bob: "Yes" or "No"]

┌─────────────────┴─────────────────┐
↓ ↓
call_accept() call_reject()
↓ ↓
[Caller Connected to Bob] [Employee Notified: "Attendant transfer failed"]

[Employee: "Sorry, Bob is unavailable"]

Direct Caller Workflow (Alternative):

[Caller] → [AI Voice Agents Auto-Answers: "How may I help you today?"]

[Caller: "I need to speak with Bob"]

[AI recognizes "Bob" = 503]

[AI calls Bob at 503]

[AI to Bob: "Call from John Smith. Accept?"]

[Bob: "Yes" or "No"]

call_accept() or call_reject()

[Connected or "Bob is unavailable"]

The Complete Script

note

The model specified in this script is gemini-2.0-flash-exp. You must verify its current availability under your Google Cloud plan. Additionally, check for potential rate limit/cost impacts and identify a suitable alternative model should the version be inaccessible.

//
// Gemini integration with attended transfer
//
// (C) Vodia Networks 2025
//
// This file is property of Vodia Networks Inc. All rights reserved.
// For more information mail Vodia Networks Inc., info@vodia.com.
//
'use strict';
var apiKey = "YOUR_GEMINI_API_KEY"
var codec = "pcm16"

var texts = {
initial: {
en: "How may I help you today?"
}
}

function text(name) {
var prompt = texts[name]
if (call.lang in prompt) return prompt[call.lang];
return prompt["en"]
}

// Handle HTTP callbacks for attended transfer results
call.http(onhttp)
function onhttp(args) {
console.log('=== Gemini HTTP Callback ===')
console.log(JSON.stringify(args))
const body = JSON.parse(args.body)
console.log('Body: ')
console.log(JSON.stringify(body))

if (body.type == 'att_transfer') {
if (body.result == 'true') {
console.log('>>> Attended transfer succeeded')
call.say('Attendant transfer succeeded')
setTimeout(function() { call.hangup() }, 1000)
}
else {
console.log('>>> Attended transfer failed')
call.say('Attendant transfer failed')
setTimeout(function() { call.hangup() }, 1000)
}
}
}

// Check if this is an attended transfer scenario
var isAttendedTransfer = (call.ivraction && call.ivraction == 'attendant')
console.log('=== Script Start - ivraction: ' + (call.ivraction || 'none') + ' ===')

var attendedTransferGreeting = ""
if (isAttendedTransfer) {
console.log('Original from: ' + call.orig_from)
var origname = call.orig_from.split('"')[1] || "someone"
console.log('Extracted caller name: ' + origname)
attendedTransferGreeting = "There is a call for you from " + origname + ". Would you like to accept it?"
console.log('Will play attended transfer greeting after Bob answers: ' + attendedTransferGreeting)
// Don't play greeting yet - wait for Bob to answer
} else {
call.say(text("initial"));
}

var timer = setTimeout(function() {
console.log("TIMEOUT: No transfer in 120 seconds, transferring to 700")
call.transfer('700')
}, 120000);

var ws = new Websocket(
"wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1alpha.GenerativeService.BidiGenerateContent?key=" + apiKey
)

ws.header([
{ name: "Content-Type", value: "application/json" }
])

ws.on('open', function() {
console.log("=== Gemini WebSocket OPENED at " + new Date().toISOString() + " ===")

// Set system instructions based on whether this is attended transfer
var systemInstructions = ""

if (isAttendedTransfer) {
console.log('=== ATTENDED TRANSFER MODE ===')

systemInstructions = "You will receive a message to speak out loud EXACTLY ONCE. " +
"After speaking it ONE TIME, stop and listen for the response. DO NOT repeat the message. " +
"If they explicitly accept the call (like 'ok', 'I will take it', 'yes', 'connect me', 'sure', 'go ahead' etc.), then call the function call_accept. " +
"If they explicitly reject the call (like 'no', 'don't connect me', 'I can't take it', 'not now', 'I'm busy' etc.) then call the function call_reject. " +
"If they don't respond or say something unclear, ask 'Would you like to accept the call?' but only once more. " +
"Do not engage in conversation or respond with normal text."
}
else {
console.log('=== NORMAL TRANSFER MODE ===')
systemInstructions = "Here is a map of words to numbers: { Sales: 502, Support: 501, Marketing: 504, Bob: 503 }. " +
"Whenever someone says call, transfer to, attendant transfer, you MUST call the function 'transfer_call'. " +
"The function argument 'destination' must be the resolved number from the map and the function argument 'name' must be the name the person said. " +
"If the user provides a number directly, use it as-is, with destination and name the same. " +
"Do not respond with normal text for these intents."
}

var setup = {
"setup": {
"model": "models/gemini-2.0-flash-exp",
"generation_config": {
"response_modalities": ["AUDIO"],
"speech_config": {
"voice_config": {
"prebuilt_voice_config": {
"voice_name": "Puck"
}
}
}
},
"system_instruction": {
"parts": [{
"text": systemInstructions
}]
},
"tools": [{
"function_declarations": [
{
"name": "transfer_call",
"description": "Transfers the active SIP call to a destination number",
"parameters": {
"type": "object",
"properties": {
"destination": {
"type": "string",
"description": "Phone number to transfer to"
},
"name": {
"type": "string",
"description": "Name to transfer to"
}
},
"required": ["destination", "name"]
}
},
{
"name": "call_accept",
"description": "Accepts the call for attended transfer"
},
{
"name": "call_reject",
"description": "Rejects the call for attended transfer"
}
]
}]
}
}

console.log("Sending setup to Gemini...")
ws.send(JSON.stringify(setup))

// Store greeting for attended transfer mode
ws.attendedTransferGreeting = attendedTransferGreeting
})

ws.on('error', function(error) {
console.log("!!! WebSocket ERROR: " + error)
call.say("I'm experiencing technical difficulties. Transferring you now.")
setTimeout(function() { call.transfer('700') }, 2000)
})

ws.on('close', function(code, reason) {
console.log("=== Gemini WebSocket CLOSED: code=" + code + ", reason=" + reason + " ===")
call.stream()
})

var messageCount = 0
var audioReceived = false
var transferInitiated = false
var greetingSent = false

ws.on('message', function(message) {
messageCount++
var msg = JSON.parse(message)

var msgType = msg.setupComplete ? "SETUP_COMPLETE" :
msg.serverContent ? "SERVER_CONTENT" :
msg.toolCall ? "TOOL_CALL" :
msg.error ? "ERROR" : "UNKNOWN"

console.log("[MSG #" + messageCount + "] Type: " + msgType)

if (msg.error) {
console.log("!!! GEMINI ERROR: " + JSON.stringify(msg.error))
call.say("Sorry, I'm having trouble. Let me transfer you.")
setTimeout(function() { call.transfer('700') }, 2000)
return
}

if (msg.setupComplete) {
console.log(">>> Setup complete, starting audio stream")

call.stream({
codec: codec,
interval: 0.5,
samplerate: 16000,
callback: stream
})

// For attended transfer, send greeting after a short delay (once Bob answers and streaming starts)
if (isAttendedTransfer && ws.attendedTransferGreeting && !greetingSent) {
console.log(">>> Scheduling attended transfer greeting to be sent")
greetingSent = true // Mark immediately to prevent duplicate scheduling
setTimeout(function() {
console.log(">>> Sending attended transfer greeting to Gemini: " + ws.attendedTransferGreeting)
ws.send(JSON.stringify({
"client_content": {
"turns": [{
"role": "user",
"parts": [{
"text": "SPEAK THIS MESSAGE OUT LOUD EXACTLY ONCE: " + ws.attendedTransferGreeting
}]
}],
"turn_complete": true
}
}))
}, 500)
}
}
else if (msg.serverContent) {
if (msg.serverContent.modelTurn) {
var parts = msg.serverContent.modelTurn.parts

for (var i = 0; i < parts.length; i++) {
var part = parts[i]

if (part.text) {
console.log(">>> Gemini said: " + part.text)
}

if (part.inlineData && part.inlineData.mimeType.startsWith("audio/")) {
if (!audioReceived) {
console.log(">>> First audio received from Gemini")
audioReceived = true
}
var audio = fromBase64String(part.inlineData.data)
call.play({
direction: "out",
codec: codec,
audio: audio
})
}

if (part.functionCall) {
handleFunctionCall(part.functionCall)
}
}
}
}
else if (msg.toolCall) {
console.log(">>> Tool call received: " + JSON.stringify(msg.toolCall))

if (msg.toolCall.functionCalls && msg.toolCall.functionCalls.length > 0) {
for (var i = 0; i < msg.toolCall.functionCalls.length; i++) {
handleFunctionCall(msg.toolCall.functionCalls[i])
}
}
}
})

function handleFunctionCall(functionCall) {
if (transferInitiated) {
console.log(">>> Transfer/action already initiated, ignoring duplicate")
return
}

console.log(">>> handleFunctionCall - name: " + functionCall.name)

if (functionCall.name === "transfer_call") {
const args = JSON.parse(JSON.stringify(functionCall.args))
const destination = args.destination
const name = args.name

console.log('>>> Transfer to destination:')
if (typeof destination == 'string') {
console.log(destination)
}
console.log('>>> Transfer to name:')
if (typeof name == 'string') {
console.log(name)
}

if (!destination || destination === "") {
console.log("!!! Empty destination, using 700")
destination = "700"
}

transferInitiated = true

if (timer) {
clearTimeout(timer)
timer = null
console.log(">>> Cleared timeout timer")
}

console.log(">>> Looking for active calls on extension: " + call.extension)
const calls = call.find('calls', call.extension)
console.log(">>> Found " + calls.length + " active calls")

if (calls.length > 0) {
const c = calls[0]
console.log(">>> Initiating attended transfer")
console.log(">>> Account: 902")
console.log(">>> Destination: " + destination)
console.log(">>> From: " + c.from)
console.log(">>> COBJ: " + c.cobj)

// Stop streaming audio on this call before initiating transfer
console.log(">>> Stopping audio stream on original call")
call.stream()

// Close WebSocket on original call
if (ws) {
console.log(">>> Closing WebSocket on original call")
ws.close()
}

call.dial({
account: '902',
dest: destination,
from: c.from,
cobj: c.cobj,
ivraction: 'attendant'
})
}
else {
console.log(">>> No active calls found, performing blind transfer")
call.transfer(destination)
}

// Send function response
if (functionCall.id) {
console.log(">>> Sending function response for transfer_call")
ws.send(JSON.stringify({
"toolResponse": {
"functionResponses": [{
"id": functionCall.id,
"name": "transfer_call",
"response": {
"success": true
}
}]
}
}))
}
}
else if (functionCall.name === "call_accept") {
console.log(">>> Accept the attended transfer call")
transferInitiated = true

if (timer) {
clearTimeout(timer)
timer = null
console.log(">>> Cleared timeout timer")
}

console.log(">>> Executing transfer accept")
call.transfer({ action: 'accept' })

if (functionCall.id) {
console.log(">>> Sending function response for call_accept")
ws.send(JSON.stringify({
"toolResponse": {
"functionResponses": [{
"id": functionCall.id,
"name": "call_accept",
"response": {
"success": true
}
}]
}
}))
}
}
else if (functionCall.name === "call_reject") {
console.log(">>> Reject the attended transfer call")
transferInitiated = true

if (timer) {
clearTimeout(timer)
timer = null
console.log(">>> Cleared timeout timer")
}

console.log(">>> Executing transfer reject")
call.transfer({ action: 'reject' })

if (functionCall.id) {
console.log(">>> Sending function response for call_reject")
ws.send(JSON.stringify({
"toolResponse": {
"functionResponses": [{
"id": functionCall.id,
"name": "call_reject",
"response": {
"success": true
}
}]
}
}))
}
}
}

var streamCount = 0
function stream(audio) {
streamCount++

if (streamCount === 1) {
console.log(">>> First audio chunk sent to Gemini, size: " + audio.length + " bytes")
}
if (streamCount % 100 === 0) {
console.log(">>> Sent " + streamCount + " audio chunks to Gemini")
}

var frame = JSON.stringify({
"realtimeInput": {
"media_chunks": [{
"mime_type": "audio/pcm;rate=16000",
"data": toBase64String(audio)
}]
}
})
ws.send(frame)
}

ws.connect()

Configuration Steps

1. Set Up the Voice Agents Extension

  • Create an extension (e.g., 5000) with this JavaScript
  • Configure it as a "JavaScript Voice Agents" or "JavaScript Application"

2. Configure Gemini API Key

  • Update the apiKey variable in the script with your Gemini API key
  • Ensure your Google Cloud account has access to the gemini-2.0-flash-exp model

3. Update the Directory Map

Configure the name-to-extension mapping for your organization:

"Here is a map of words to numbers: { Sales: 502, Support: 501, Marketing: 504, Bob: 503, IT Department: 300 }"

4. Update the Account Parameter

In the call.dial() function, update the account parameter to match your SIP trunk account ID:

call.dial({ 
account: '902', // Change to your account ID
dest: destination,
from: c.from,
cobj: c.cobj,
ivraction: 'attendant'
})

5. Program Desk Phone Buttons

Configure a button on your desk phones to streamline the process:

Yealink Example:

  • Button Type: Speed Dial or Transfer
  • Label: Screen Call
  • Value: 5000 (your Voice Agents extension)

Polycom Example:

  • Line Key: Speed Dial
  • Label: Screen
  • Number: 5000

Snom Example:

  • Function Key: Transfer
  • Number: 5000
  • Label: AI Screen

When this button is pressed:

  1. Current call is automatically put on hold
  2. Voice Agents extension is dialed
  3. Employee can immediately give instructions

6. Optional: Direct Routing

Set up direct routing for callers to reach the AI Voice Agents automatically:

  • Configure as after-hours destination
  • Set as department directory option
  • Use as overflow when no one answers

Real-World Benefits & Use Cases

  • Executive Screening: "There's a call from ABC Corp. Would you like to take it?"
  • Busy Checking: Employees can decline calls when in meetings or focused work
  • Privacy Protection: Decide who to speak with before being connected
  • Context Provided: AI announces caller information to help with decision-making
  • Professional Experience: More sophisticated than blind transfers
  • Office Efficiency: Anyone can screen calls with one button press
  • Reception Support: Front desk can handle multiple callers simultaneously
  • Team Coordination: Department members can screen for each other
  • Remote Workers: Cloud-based desk phones get the same functionality
  • After Hours: AI can handle screening when office is closed

For more information on Vodia's JavaScript capabilities, refer to: Vodia Backend JavaScript Documentation