Gemini 2.0 - Multimodal live API: Tool use

This notebook provides examples of how to use tools with the multimodal live API with Gemini 2.0.

The API provides Google Search, Code Execution and Function Calling tools. The earlier Gemini models supported versions of these tools. The biggest change with Gemini 2 (in the Live API) is that, basically, all the tools are handled by Code Execution. With that change, you can use multiple tools in a single API call, and the model can use multiple tools in a single code execution block.

This tutorial assumes you are familiar with the Live API, as described in the this tutorial.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── Get_started_LiveAPI_tools.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Select a model

Multimodal Live API are a new capability introduced with the Gemini 2.0 model. It won’t work with previous generation models.

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.0-flash-live-001";

Utilites

You’re going to use the Live API’s audio output, the easiest way hear it in Colab is to write the PCM data out as a WAV file:

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");
const wave = require("wavefile") as typeof import("wavefile");

function saveAudioToFile(audioData: Int16Array, filePath: string) {
  fs.mkdirSync(path.dirname(filePath), { recursive: true });
  const wav = new wave.WaveFile();
  wav.fromScratch(1, 24000, "16", audioData);
  fs.writeFileSync(filePath, wav.toBuffer());
  console.debug(`Audio saved to ${filePath}`);
}

Get Started

Most of the Live API setup will be similar to the starter tutorial. Since this tutorial doesn’t focus on the realtime interactivity of the API, the code has been simplified: This code uses the Live API, but it only sends a single text prompt, and listens for a single turn of replies.

You can set modality=“AUDIO” on any of the examples to get the spoken version of the output.

import { FunctionResponse, LiveServerContent, LiveServerToolCall, Modality, Session, Tool } from "@google/genai";

function handleServerContent(content: LiveServerContent) {
  if (content.modelTurn) {
    for (const turn of content.modelTurn.parts ?? []) {
      if (turn.executableCode) {
        tslab.display.markdown("-------------------------------");
        tslab.display.markdown(`\`\`\`python\n${turn.executableCode.code}\n\`\`\``);
        tslab.display.markdown("-------------------------------");
      }
      if (turn.codeExecutionResult) {
        tslab.display.markdown("-------------------------------");
        tslab.display.markdown(`\`\`\`\n${turn.codeExecutionResult.output}\n\`\`\``);
        tslab.display.markdown("-------------------------------");
      }
    }
  }
  if (content.groundingMetadata) {
    tslab.display.html(content.groundingMetadata.searchEntryPoint?.renderedContent ?? "");
  }
}

function handleToolCall(session: Session, toolCall: LiveServerToolCall) {
  const responses: FunctionResponse[] = [];
  for (const fc of toolCall.functionCalls ?? []) {
    responses.push({
      id: fc.id,
      name: fc.name,
      response: {
        result: "ok",
      },
    });
  }
  console.log("Tool call responses:", JSON.stringify(responses, null, 2));
  session.sendToolResponse({
    functionResponses: responses,
  });
}

async function run(prompt: string, modality: Modality = Modality.TEXT, tools: Tool[] = []) {
  const audioData: number[] = [];
  const audioFileName = `audio-${Date.now()}.wav`;
  let completed = false;
  const session = await ai.live.connect({
    model: MODEL_ID,
    callbacks: {
      onopen: () => {
        console.log("Connection opened");
      },
      onclose: () => {
        console.log("Connection closed");
      },
      onerror: (error) => {
        console.error("Error:", error.message);
      },
      onmessage: (message) => {
        if (message.text) {
          tslab.display.markdown(message.text);
          return;
        }
        if (message.data) {
          const audioBuffer = Buffer.from(message.data, "base64");
          const audio = new Int16Array(
            audioBuffer.buffer,
            audioBuffer.byteOffset,
            audioBuffer.length / Int16Array.BYTES_PER_ELEMENT
          );
          audioData.push(...audio);
          return;
        }
        if (message.serverContent) {
          handleServerContent(message.serverContent);
          if (message.serverContent.turnComplete) {
            completed = true;
          }
          return;
        }
        if (message.toolCall) {
          handleToolCall(session, message.toolCall);
          completed = true;
          return;
        }
      },
    },
    config: {
      tools: tools,
      responseModalities: [modality],
    },
  });
  console.log("Prompt: ", prompt);
  session.sendClientContent({
    turns: [prompt],
    turnComplete: true,
  });
  // eslint-disable-next-line @typescript-eslint/no-unnecessary-condition
  while (!completed) {
    await new Promise((resolve) => setTimeout(resolve, 100));
  }
  if (audioData.length > 0) {
    saveAudioToFile(new Int16Array(audioData), path.join("audio", audioFileName));
    console.log(`Audio saved to ${audioFileName}`);
    tslab.display.html(
      `<audio controls><source src="${audioFileName}" type="audio/wav">Your browser does not support the audio element.</audio>`
    );
  }
  console.log("Session completed");
  session.close();
}

Since this tutorial demonstrates several tools, you’ll need more code to handle the different types of objects it returns.

  • The codeExecution tool can return executableCode and codeExecutionResult parts.
  • The googleSearch tool may attach a groundingMetadata object.
  • Finally, with the functionDeclations tool, the API may return toolCall objects.To keep this code minimal, the toolCall handler just replies to every function call with a response of "ok".
await run("Hello?");
Connection opened
Prompt:  Hello?

Hello! How can

I help you today?

Session completed
Connection closed

Simple Function Call

The function calling feature of the API Can handle a wide variety of functions. Support in the SDK is still under construction. So keep this simple just send a minimal function definition: Just the function’s name.

Note that in the live API function calls are independent of the chat turns. The conversation can continue while a function call is being processed.

import { FunctionDeclaration, Tool } from "@google/genai";

const turn_on_the_lights = {
  name: "turn_on_the_lights",
  description: "Turn on the lights in the room",
} satisfies FunctionDeclaration;
const turn_off_the_lights: FunctionDeclaration = {
  name: "turn_off_the_lights",
  description: "Turn off the lights in the room",
} satisfies FunctionDeclaration;
const function_call_tools: Tool[] = [{ functionDeclarations: [turn_on_the_lights, turn_off_the_lights] }];

// temporarily make console.warn a no-op to avoid warnings in the output (non-text part in GenerateContentResponse caused by accessing .text)
// https://github.com/googleapis/js-genai/blob/d82aba244bdb804b063ef8a983b2916c00b901d2/src/types.ts#L2005
// copy the original console.warn function to restore it later
const warn_fn = console.warn;
// eslint-disable-next-line @typescript-eslint/no-empty-function, no-empty-function
console.warn = function () {};

await run("Turn on the lights", google.Modality.TEXT, function_call_tools);
// restore console.warn later
// console.warn = warn_fn;
Connection opened
Prompt:  Turn on the lights

print(default_api.turn_on_the_lights())

Tool call responses: [
  {
    "id": "function-call-16720258795371319743",
    "name": "turn_on_the_lights",
    "response": {
      "result": "ok"
    }
  }
]

{'result': 'ok'}

OK, I’ve turned on the lights.

Session completed
Connection closed

Code Execution

The codeExecution lets the model write and run python code. Try it on a math problem the model can’t solve from memory:

await run("Can you compute the largest prime palindrome under 100000.", google.Modality.TEXT, [{ codeExecution: {} }]);
Connection opened
Prompt:  Can you compute the largest prime palindrome under 100000.

Okay, I can help you with that. Here’s my plan:

  1. Generate Palindromes: Create a list of all palindromes under 100000.
  2. Check for Primality: Iterate through the palindromes and check if each one is prime.
  3. Find the Largest: Keep track of the largest prime palindrome found so far.

Here’s the code to do that:


def is_palindrome(n):
  """Checks if a number is a palindrome."""
  return str(n) == str(n)[::-1]


def is_prime(n):
  """Checks if a number is prime."""
  if n < 2:
    return False
  for i in range(2, int(n**0.5) + 1):
    if n % i == 0:
      return False
  return True


largest_prime_palindrome = 0
for i in range(100000):
  if is_palindrome(i) and is_prime(i):
    largest_prime_palindrome = i

print(largest_prime_palindrome)


98689

The largest prime palindrome

under 100000 is 98689.

Session completed
Connection closed

Compositional Function Calling

Compositional function calling refers to the ability to combine user defined functions with the codeExecution tool. The model will write them into larger blocks of code, and then pause execution while it waits for you to send back responses for each call.

await run("Can you turn on the lights wait 10s and then turn them off?", google.Modality.TEXT, [
  ...function_call_tools,
  { codeExecution: {} },
]);
Connection opened
Prompt:  Can you turn on the lights wait 10s and then turn them off?

import time

default_api.turn_on_the_lights()
time.sleep(10)
default_api.turn_off_the_lights()

Tool call responses: [
  {
    "id": "function-call-448821244251533960",
    "name": "turn_on_the_lights",
    "response": {
      "result": "ok"
    }
  }
]
Session completed
Connection closed

Multiple tools

The biggest difference with the new API however is that you’re no longer limited to using 1-tool per request. Try combining those tasks from the previous sections:

import { Tool } from "@google/genai";

const multi_tool_prompt = `
  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!
`;
const multi_tool_tools: Tool[] = [
  { codeExecution: {} },
  { googleSearch: {} },
  { functionDeclarations: [turn_on_the_lights, turn_off_the_lights] },
];

await run(multi_tool_prompt, google.Modality.TEXT, multi_tool_tools);
Connection opened
Prompt:  
  Hey, I need you to do three things for me.

  1. Then compute the largest prime plaindrome under 100000.
  2. Then use google search to lookup unformation about the largest earthquake in california the week of Dec 5 2024?
  3. Turn on the lights

  Thanks!

Okay, I can do that. Here’s the plan:

  1. Compute the largest prime palindrome under 100000. I’ll use a Python script to achieve this.
  2. Use Google Search to look up information about the largest earthquake in California the week of Dec 5 2024.
  3. Turn on the lights using the provided API.

Here’s the first step, computing the largest prime palindrome under 100000:


def is_palindrome(n):
  return str(n) == str(n)[::-1]

def is_prime(n):
  if n < 2:
    return False
  for i in range(2, int(n**0.5) + 1):
    if n % i == 0:
      return False
  return True

largest_prime_palindrome = 0
for i in range(99999, 1, -1):
  if is_palindrome(i) and is_prime(i):
    largest_prime_palindrome = i
    break

print(largest_prime_palindrome)

98689

Okay, the largest prime palindrome under 100000 is 98689.

Now, let’s use Google Search to find the largest earthquake in California the week of Dec 5 2024.


concise_search("largest earthquake california week of December 5 2024", max_num_results=5)


Looking up information on Google Search.

Based on the search results, the largest earthquake in California during the week of December 5, 2024, was a magnitude 7.0 earthquake offshore of Cape Mendocino on December 5, 2024, at 10:44 a.m. PST.

Finally, I will turn on the lights.


default_api.turn_on_the_lights()

Tool call responses: [
  {
    "id": "function-call-10200942088489058256",
    "name": "turn_on_the_lights",
    "response": {
      "result": "ok"
    }
  }
]
Session completed
Connection closed

Next Steps

Or check the other Gemini 2.0 capabilities from the Cookbook, in particular this other multi-tool example and the one about Gemini spatial capabilities.