This notebook demonstrates simple usage of the Gemini Multimodal Live API. For an overview of new capabilities refer to the Gemini Live API docs.
This notebook implements a simple turn-based chat where you send messages as text, and the model replies with audio. The API is capable of much more than that. The goal here is to demonstrate with simple code.
The Next steps section at the end of this tutorial provides links to additional resources.
Native audio output
Info: Gemini 2.5 introduces native audio generation, which directly generates audio output, providing a more natural sounding audio, more expressive voices, more awareness of additional context, e.g., tone, and more proactive responses.
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
The Live API uses a streaming model over a WebSocket connection. When you interact with the API, a persistent connection is created. Your input (audio, video, or text) is streamed continuously to the model, and the model’s response (text or audio) is streamed back in real-time over the same connection. Here we use a responseQueue to handle the streaming responses and determine when the server has finished sending the response.
Opened
Sent message: Hello? Gemini are you there?
Received response: Yes, I am
Received response: here! How can I help you today?
Session closed
Close:
Text to audio
The simplest way to playback the audio in Colab, is to write it out to a .wav file. So here is a simple wave file writer:
Opened
Sent message: Hello? Gemini are you there?
Received complete response
Audio saved to ../assets/live/text_to_audio_response.wav
Session closed
Text to Audio Response
Close:
Towards Async Tasks
The real power of the Live API is that it’s real time, and interruptable. You can’t get that full power in a simple sequence of steps. To really use the functionality you will move the send and recieve operations (and others) into their own async tasks.
Because of the limitations of Colab this tutorial doesn’t totally implement the interactive async tasks, but it does implement the next step in that direction:
It separates the send and receive, but still runs them sequentially.
In the next tutorial you’ll run these in separate async tasks.
Opened
Sent message: Hello? Gemini are you there?
Received complete response
Audio saved to ../assets/live/audio_response_0.wav
Audio Response 1
Sent message: Can you tell me a joke?
Received complete response
Audio saved to ../assets/live/audio_response_1.wav
Audio Response 2
Sent message: What is the weather like today?
Received complete response
Audio saved to ../assets/live/audio_response_2.wav
Audio Response 3
Session closed
Close:
The above code is divided into several sections:
start: Initializes the client and sets up the WebSocket connection.
send: Sends a message to the model.
receive: Receives the model’s response and collects the audio chunks in a loop and writes them to wav file. It breaks when the model indicates it has finished sending the response.
asyncAudioLooper: This is the main driver function that brings everything together. It initializes the client, starts the WebSocket connection, and then enters a loop where it sends messages and receives responses.
Working with resumable sessions
Session resumption allows you to return to a previous interaction with the Live API by sending the last session handle you got from the previous session.
When you set your session to be resumable, the session information keeps stored on the Live API for up to 24 hours. In this time window, you can resume the conversation and refer to previous information you have shared with the model.
Connecting to the service with handle undefined...
Opened
Sending message: Hello
Received message: {"setupComplete":{}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"Hello there! How"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" can I help you today?\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":9,"responseTokenCount":11,"totalTokenCount":20,"promptTokensDetails":[{"modality":"TEXT","tokenCount":9}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":11}]}}
Sending message: What is the capital of Brazil?
Received message: {"sessionResumptionUpdate":{"newHandle":"CihqdTFxaG1ua2g2aTkweWtiNzB5Ymdzc3V0bW16eDE2ZGkxaXR2d2dt","resumable":true}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"The capital of Brazil"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" is **Brasília**.\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":37,"responseTokenCount":10,"totalTokenCount":47,"promptTokensDetails":[{"modality":"TEXT","tokenCount":37}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":10}]}}
Received message: {"sessionResumptionUpdate":{"newHandle":"CihrNGZyMjh4dXY3cXFkYzVmMjR5cnlmZ2w5bnBvNTRhcmoxNW1lN2Fi","resumable":true}}
Close:
With the session resumption you have the session handle to refer to your previous sessions. In this example, the handle is saved at the handle variable as below:
Now you can start a new Live API session, but this time pointing to a handle from a previous session. Also, to test you could gather information from the previous session, you will ask the model what was the second question you asked before (in this example, it was “what is the capital of Brazil?”). You can see the Live API recovering that information:
awaitresumable_session(HANDLE, ["what was the last question I asked?"]);
Connecting to the service with handle CihrNGZyMjh4dXY3cXFkYzVmMjR5cnlmZ2w5bnBvNTRhcmoxNW1lN2Fi...
Opened
Sending message: what was the last question I asked?
Received message: {"setupComplete":{}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"The"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" last question you asked was: \"What is the capital of Brazil?\"\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":65,"responseTokenCount":16,"totalTokenCount":81,"promptTokensDetails":[{"modality":"TEXT","tokenCount":65}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":16}]}}
Received message: {"sessionResumptionUpdate":{"newHandle":"CihmcW04ZzVnZnZwczU2ZnkwN2h1NHpmajFxZmgwcmhieTZ3Zmo3OWt6","resumable":true}}
Close:
Next steps
This tutorial just shows basic usage of the Live API, using the Python GenAI SDK.