Gemini 2.X - Multi-tool with the Multimodal Live API
In this notebook you will learn how to use tools, including charting tools, Google Search and code execution in the Gemini 2 Multimodal Live API. For an overview of new capabilities refer to the Gemini 2 docs.
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");const GEOAPI_KEY =process.env.GEOAPI_KEY??"";if (!GEOAPI_KEY) {thrownewError("GEOAPI_KEY is not set in the environment variables");}console.log("GEOAPI_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
GEOAPI_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.
const google =require("@google/genai") astypeofimport("@google/genai");const ai =new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });
Select a model
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).
Now, let’s see how all the pieces you’ve defined fit together in a simple example. You’ll send a single prompt to the API and observe the response.
This example uses the following function to send a prompt to the API and print the response:
handleServerContent: This function handles responses from the server, printing the content with appropriate formatting.
handleToolCall: This function handles tool calls, printing the tool name and arguments and returning a response.
All of the above is bought together in the GenAISession class that manages the session, tools, and server communication.
Note that you can change the modality from AUDIO to TEXT and adjust the prompt.
// temporarily make console.warn a no-op to avoid warnings in the output (non-text part in GenerateContentResponse caused by accessing .text)// https://github.com/googleapis/js-genai/blob/d82aba244bdb804b063ef8a983b2916c00b901d2/src/types.ts#L2005// copy the original console.warn function to restore it laterconst warn_fn =console.warn;// eslint-disable-next-line @typescript-eslint/no-empty-function, no-empty-functionconsole.warn=function () {};asyncfunctiononeTurnExample() {const session =newGenAISession(google.Modality.TEXT, [{ googleSearch: {} }, { codeExecution: {} }]);await session.open();await session.sendPrompt("Please find the last 5 Denis Villeneuve movies and look up their runtimes and the year published." ); session.close();}awaitoneTurnExample();// restore console.warn later// console.warn = warn_fn;
Now define additional tools. Add a tool for charting by defining a schema (in altair_fns), a function to execute (render_altair) and connect the two using the tool_calls mapping.
The charting tool used here is Vega-Altair, a “declarative statistical visualization library for Python”. Altair supports chart persistance using JSON, which you will expose as a tool so that the Gemini model can produce a chart.
The helper code defined earlier will run as soon as it can, but audio takes some time to play so you may see output from later turns displayed before the audio has played.
asyncfunctionmultiTool() {const session =newGenAISession(google.Modality.TEXT, [ { googleSearch: {} }, { codeExecution: {} }, { functionDeclarations: [altairFn] }, ]); session.setCallableMap({ render_altair: altairToPlotlyHTML, });await session.open();await session.sendPrompt("Please find the last 5 Denis Villeneuve movies and find their runtimes.");await session.sendPrompt("Can you write some code to work out which has the longest and shortest runtimes?");await session.sendPrompt("Now can you plot them in a line chart showing the year on the x-axis and runtime on the y-axis?" ); session.close();}awaitmultiTool();
Here are the runtimes for the last 5 Denis Villeneuve movies:
Dune: Part Two (2024) - 2 hours 46 minutes (166 minutes)
Dune (2021) - 2 hours 35 minutes (155 minutes)
Blade Runner 2049 (2017) - 2 hours 43 minutes (163 minutes is also seen)
Arrival (2016) - 1 hour 56 minutes (116 minutes is also seen, but more sources say 116 minutes)
Sicario (2015) - 2 hours 1 minute (121 minutes)
There’s a small discrepancy for Arrival and Blade Runner 2049. Arrival appears to be 116 minutes. Blade Runner 2049 appears to be 163 minutes, rather than 164. Sicario appears to have two runtimes. I’ll use the most common runtimes found.
runtimes = {"Dune: Part Two (2024)": 166,"Dune (2021)": 155,"Blade Runner 2049 (2017)": 163,"Arrival (2016)": 116,"Sicario (2015)": 121}longest_movie =max(runtimes, key=runtimes.get)shortest_movie =min(runtimes, key=runtimes.get)print(f"The longest movie is: {longest_movie} with a runtime of {runtimes[longest_movie]} minutes.")print(f"The shortest movie is: {shortest_movie} with a runtime of {runtimes[shortest_movie]} minutes.")
The longest movie is: Dune: Part Two (2024) with a runtime of 166 minutes.
The shortest movie is: Arrival (2016) with a runtime of 116 minutes.
The longest movie is Dune: Part Two (2024) with a runtime of 166 minutes. The shortest movie is Arrival (2016) with a runtime of 116 minutes.
For this example you will use the Geoapify Maps Static API to draw on a map during the conversation. You’ll need to make sure your API key is enabled for the Geoapify Maps Static API.
Add the key in your secrets repository, or add it in the code directly (GEOAPI_KEY = '...', not recommended).
The following cell is hidden by default, but needs te be run. It comtains the function schema for the draw_map function, including some documentation on how to draw markers with the Google Maps API.
Note that the model needs to produce a fairly complex set of parameters in order to call draw_map, including defining a center-point for the map, an integer zoom level and custom marker styles and locations.
import { FunctionDeclaration } from"@google/genai";const mapFn: FunctionDeclaration = { name:"draw_map", description:"Render a Google Maps static map using the specified parameters. No information is returned.", parameters: { type: google.Type.OBJECT, properties: { center: { type: google.Type.STRING, description:"Location to center the map. It has to be a lat,lng pair (e.g. 40.714728,-73.998672).", }, zoom: { type: google.Type.NUMBER, description:"Google Maps zoom level. 1 is the world, 20 is zoomed in to building level. Integer only. Level 11 shows about a 15km radius. Level 9 is about 30km radius.", }, path: { type: google.Type.STRING, description:`The path parameter defines a set of one or more locations connected by a path to overlay on the map image. The path parameter takes set of value assignments (path descriptors) of the following format:path=pathStyles|pathLocation1|pathLocation2|... etc.Note that both path points are separated from each other using the pipe character (|). Because both style information and point information is delimited via the pipe character, style information must appear first in any path descriptor. Once the Maps Static API server encounters a location in the path descriptor, all other path parameters are assumed to be locations as well.Path stylesThe set of path style descriptors is a series of value assignments separated by the pipe (|) character. This style descriptor defines the visual attributes to use when displaying the path. These style descriptors contain the following key/value assignments:weight: (optional) specifies the thickness of the path in pixels. If no weight parameter is set, the path will appear in its default thickness (5 pixels).color: (optional) specifies a color either as a 24-bit (example: color=0xFFFFCC) or 32-bit hexadecimal value (example: color=0xFFFFCCFF), or from the set {black, brown, green, purple, yellow, blue, gray, orange, red, white} only use hex format of the colors even if they are in the set.When a 32-bit hex value is specified, the last two characters specify the 8-bit alpha transparency value. This value varies between 00 (completely transparent) and FF (completely opaque). Note that transparencies are supported in paths, though they are not supported for markers.fillcolor: (optional) indicates both that the path marks off a polygonal area and specifies the fill color to use as an overlay within that area. The set of locations following need not be a "closed" loop; the Maps Static API server will automatically join the first and last points. Note, however, that any stroke on the exterior of the filled area will not be closed unless you specifically provide the same beginning and end location.geodesic: (optional) indicates that the requested path should be interpreted as a geodesic line that follows the curvature of the earth. When false, the path is rendered as a straight line in screen space. Defaults to false.Some example path definitions:Thin blue line, 50% opacity: path=color:0x0000ff80|weight:1Solid red line: path=color:0xff0000ff|weight:5Solid thick white line: path=color:0xffffffff|weight:10These path styles are optional. If default attributes are desired, you may skip defining the path attributes; in that case, the path descriptor's first "argument" will consist instead of the first declared point (location).Path pointsIn order to draw a path, the path parameter must also be passed two or more points. The Maps Static API will then connect the path along those points, in the specified order. Each pathPoint is denoted in the pathDescriptor separated by the | (pipe) character.`, }, markers: { type: google.Type.ARRAY, items: { type: google.Type.STRING, }, description:`The markers parameter defines a set of one or more markers (map pins) at a set of locations. Each marker defined within a single markers declaration must exhibit the same visual style; if you wish to display markers with different styles, you will need to supply multiple markers parameters with separate style information.The markers parameter takes set of value assignments (marker descriptors) of the following format:markers=markerStyles|markerLocation1| markerLocation2|... etc.The set of markerStyles is declared at the beginning of the markers declaration and consists of zero or more style descriptors separated by the pipe character (|), followed by a set of one or more locations also separated by the pipe character (|).Because both style information and location information is delimited via the pipe character, style information must appear first in any marker descriptor. Once the Maps Static API server encounters a location in the marker descriptor, all other marker parameters are assumed to be locations as well.Marker stylesThe set of marker style descriptors is a series of value assignments separated by the pipe (|) character. This style descriptor defines the visual attributes to use when displaying the markers within this marker descriptor. These style descriptors contain the following key/value assignments:size: (optional) specifies the size of marker from the set {tiny, mid, small}. If no size parameter is set, the marker will appear in its default (normal) size.color: (optional) specifies a 24-bit color (example: color=0xFFFFCC) or a predefined color from the set {black, brown, green, purple, yellow, blue, gray, orange, red, white} only use hex format of the colors even if they are in the set.Note that transparencies (specified using 32-bit hex color values) are not supported in markers, though they are supported for paths.label: (optional) specifies a single uppercase alphanumeric character from the set {A-Z, 0-9}. (The requirement for uppercase characters is new to this version of the API.) Note that default and mid sized markers are the only markers capable of displaying an alphanumeric-character parameter. tiny and small markers are not capable of displaying an alphanumeric-character.Note: Location must be specified as a lat,lng pair (e.g. 40.714728,-73.998672). The Maps Static API server will not accept any other location format.`, }, }, required: ["center","zoom"], },};
Now define the draw_map function and add googleSearch as a tool to use for this conversation. This will allow the model to look up restaurants that might be popular.
asyncfunctionmapExample() {const session =newGenAISession(google.Modality.TEXT, [{ googleSearch: {} }, { functionDeclarations: [mapFn] }]); session.setCallableMap({ draw_map: drawMap, });await session.open();await session.sendPrompt("Please look up and mark 3 Sydney restaurants that are currently trending on a map.");await session.sendPrompt("Now write some code to randomly pick one to eat at tonight and zoom in to that one on the map." ); session.close();}awaitmapExample();
I have marked Nomad (red), Quay (blue), and Bennelong (green) on a map centered around Sydney.
Connection closed
Maps with Code execution
In this example, you will use the Google Maps tools defined before, and you’ll challenge the model to generate a color gradient and uses it to visually represent data on a map. This task requires code execution, so it is also included as a tool.
Specifically, you will ask the model to plot the capital cities in Australia, and apply a gradient between two colors in a circular direction around the country using Google Maps markers.
asyncfunctionmapWithCodeExecution() {const session =newGenAISession(google.Modality.TEXT, [ { googleSearch: {} }, { codeExecution: {} }, { functionDeclarations: [mapFn] }, ]); session.setCallableMap({ draw_map: drawMap, });await session.open();await session.sendPrompt("Plot markers on every capital city in Australia using a gradient between Orange and Green. Plan out your steps first, then follow the plan." );await session.sendPrompt("Awesome! Can you ensure the gradient is applied smoothly in a circular direction around the country?" ); session.close();}awaitmapWithCodeExecution();
Connection opened
Okay, I will plot markers on every capital city in Australia using a gradient between Orange and Green. Here’s the plan:
Identify the capital cities and their coordinates: I’ll use search queries to find the capital cities of each Australian state and territory, as well as their latitude and longitude.
Assign colors: I’ll assign colors to the markers, creating a gradient from orange to green. Since there are 8 capital cities, I will generate 8 hex color codes, starting with orange and transitioning to green.
Create marker strings: I’ll construct the marker strings for each capital city, including the color and coordinates.
Call the draw_map function: I’ll call the draw_map function with the appropriate center, zoom level, and marker strings. I’ll center the map on Australia and use a reasonable zoom level.
Now let’s execute the plan.
Step 1: Identify the capital cities and their coordinates
From the search results, I’ve gathered the following approximate coordinates:
Canberra: -35.28, 149.13
Sydney: -33.87, 151.21
Melbourne: -37.81, 144.96
Brisbane: -27.47, 153.02
Perth: -31.95, 115.86
Adelaide: -34.92, 138.60
Hobart: -42.88, 147.32
Darwin: -12.46, 130.84
Step 2: Assign Colors
I will use a simple linear interpolation between orange (#FFA500) and green (#00FF00) to generate 8 colors. I will use python to generate the color codes.
def hex_to_rgb(hex_color): hex_color = hex_color.lstrip('#')returntuple(int(hex_color[i:i+2], 16) for i in (0, 2, 4))def rgb_to_hex(rgb_color):return'#{:02x}{:02x}{:02x}'.format(rgb_color[0], rgb_color[1], rgb_color[2])def color_gradient(start_color, end_color, n): start_rgb = hex_to_rgb(start_color) end_rgb = hex_to_rgb(end_color) colors = []for i inrange(n): r =int(start_rgb[0] + (end_rgb[0] - start_rgb[0]) * i / (n -1)) g =int(start_rgb[1] + (end_rgb[1] - start_rgb[1]) * i / (n -1)) b =int(start_rgb[2] + (end_rgb[2] - start_rgb[2]) * i / (n -1)) colors.append(rgb_to_hex((r, g, b)))return colorscolors = color_gradient("#FFA500", "#00FF00", 8)print(colors)
Finally, I’ll call the draw_map function with the center of Australia and a zoom level that shows the whole continent. I’ll use -25, 135 as the center and a zoom level of 4.
I have plotted the capital cities of Australia with a gradient from orange to green.
Connection closed
Performance in this example depends on your feedback to get the output perfect. This example showed the first 2 steps of a hypothetical conversation, but you could keep iterating with the model until the results are what you need.
Next steps
This guide shows more intermedite use of the Multimodal Live API over Websockets.