Multimodal Live API - Quickstart

Preview: The Live API is in preview.

This notebook demonstrates simple usage of the Gemini Multimodal Live API. For an overview of new capabilities refer to the Gemini Live API docs.

This notebook implements a simple turn-based chat where you send messages as text, and the model replies with audio. The API is capable of much more than that. The goal here is to demonstrate with simple code.

If you aren’t looking for code, and just want to try multimedia streaming use Live API in Google AI Studio.

The Next steps section at the end of this tutorial provides links to additional resources.

Native audio output

Info: Gemini 2.5 introduces native audio generation, which directly generates audio output, providing a more natural sounding audio, more expressive voices, more awareness of additional context, e.g., tone, and more proactive responses.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── Get_started_LiveAPI.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Text to Text

The simplest way to use the Live API is as a text-to-text chat interface, but it can do a lot more than this.

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.0-flash-live-001";

The Live API uses a streaming model over a WebSocket connection. When you interact with the API, a persistent connection is created. Your input (audio, video, or text) is streamed continuously to the model, and the model’s response (text or audio) is streamed back in real-time over the same connection. Here we use a responseQueue to handle the streaming responses and determine when the server has finished sending the response.

import { LiveServerMessage, Modality } from "@google/genai";

async function text_to_text() {
  const responseQueue: LiveServerMessage[] = [];
  const session = await ai.live.connect({
    model: MODEL_ID,
    callbacks: {
      onopen: function () {
        console.debug("Opened");
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug("Error:", e.message);
      },
      onclose: function (e) {
        console.debug("Close:", e.reason);
      },
    },
    config: { responseModalities: [Modality.TEXT] },
  });
  const message = "Hello? Gemini are you there?";
  session.sendClientContent({
    turns: message,
    turnComplete: true,
  });
  console.debug("Sent message:", message);
  let done = false;
  while (!done) {
    if (responseQueue.length > 0) {
      const response = responseQueue.shift();
      if (response?.text) {
        console.debug("Received response:", response.text);
      } else if (response?.data) {
        console.debug("Received data:", response.data);
      }
      if (response?.serverContent?.turnComplete) {
        done = true;
      }
    } else {
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
  }
  session.close();
  console.debug("Session closed");
}

await text_to_text();

Opened
Sent message: Hello? Gemini are you there?
Received response: Yes, I am
Received response:  here! How can I help you today?

Session closed
Close:

Text to audio

The simplest way to playback the audio in Colab, is to write it out to a .wav file. So here is a simple wave file writer:

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");
const wave = require("wavefile") as typeof import("wavefile");

function saveAudioToFile(audioData: Int16Array, filePath: string) {
  fs.mkdirSync(path.dirname(filePath), { recursive: true });
  const wav = new wave.WaveFile();
  wav.fromScratch(1, 24000, "16", audioData);
  fs.writeFileSync(filePath, wav.toBuffer());
  console.debug(`Audio saved to ${filePath}`);
}

import { LiveServerMessage, Modality } from "@google/genai";

async function text_to_audio() {
  const responseQueue: LiveServerMessage[] = [];
  const session = await ai.live.connect({
    model: MODEL_ID,
    callbacks: {
      onopen: function () {
        console.debug("Opened");
      },
      onmessage: function (message) {
        responseQueue.push(message);
      },
      onerror: function (e) {
        console.debug("Error:", e.message);
      },
      onclose: function (e) {
        console.debug("Close:", e.reason);
      },
    },
    config: { responseModalities: [Modality.AUDIO] },
  });
  const message = "Hello? Gemini are you there?";
  session.sendClientContent({
    turns: message,
    turnComplete: true,
  });
  console.debug("Sent message:", message);
  let done = false;
  const chunks: LiveServerMessage[] = [];
  while (!done) {
    if (responseQueue.length > 0) {
      const response = responseQueue.shift();
      if (response) {
        chunks.push(response);
      }
      if (response?.serverContent?.turnComplete) {
        done = true;
        console.debug("Received complete response");
      }
    } else {
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
  }
  const audioData = chunks.reduce<number[]>((acc, message) => {
    if (message.data) {
      const audioBuffer = Buffer.from(message.data, "base64");
      const intArray = new Int16Array(
        audioBuffer.buffer,
        audioBuffer.byteOffset,
        audioBuffer.length / Int16Array.BYTES_PER_ELEMENT
      );
      return acc.concat(Array.from(intArray));
    }
    return acc;
  }, []);
  const audioFilePath = path.join("../assets/live", "text_to_audio_response.wav");
  saveAudioToFile(new Int16Array(audioData), audioFilePath);
  session.close();
  console.debug("Session closed");
}

await text_to_audio();
tslab.display.html(`
  <h3>Text to Audio Response</h3>
  <audio controls>
    <source src="../assets/live/text_to_audio_response.wav" type="audio/wav">
    Your browser does not support the audio element.
  </audio>
  </audio>
`);

Opened
Sent message: Hello? Gemini are you there?
Received complete response
Audio saved to ../assets/live/text_to_audio_response.wav
Session closed

Text to Audio Response

Close:

Towards Async Tasks

The real power of the Live API is that it’s real time, and interruptable. You can’t get that full power in a simple sequence of steps. To really use the functionality you will move the send and recieve operations (and others) into their own async tasks.

Because of the limitations of Colab this tutorial doesn’t totally implement the interactive async tasks, but it does implement the next step in that direction:

It separates the send and receive, but still runs them sequentially.
In the next tutorial you’ll run these in separate async tasks.

import { GoogleGenAI, LiveServerMessage, Modality, Session } from "@google/genai";

class AudioLooper {
  private session: Session;
  private turnIndex = 0;
  private responseQueue: LiveServerMessage[] = [];

  constructor(
    private ai: GoogleGenAI,
    private modelId: string
  ) {}

  async start() {
    this.session = await this.ai.live.connect({
      model: this.modelId,
      callbacks: {
        onopen: () => {
          console.debug("Opened");
        },
        onmessage: (message) => this.responseQueue.push(message),
        onerror: (e) => {
          console.debug("Error:", e.message);
        },
        onclose: (e) => {
          console.debug("Close:", e.reason);
        },
      },
      config: { responseModalities: [Modality.AUDIO] },
    });
  }

  send(message: string) {
    this.session.sendClientContent({
      turns: message,
      turnComplete: true,
    });
    console.debug("Sent message:", message);
  }

  async receive() {
    let done = false;
    const audioChunks: number[] = [];
    while (!done) {
      if (this.responseQueue.length > 0) {
        const response = this.responseQueue.shift();
        if (response?.data) {
          const audioBuffer = Buffer.from(response.data, "base64");
          const intArray = new Int16Array(
            audioBuffer.buffer,
            audioBuffer.byteOffset,
            audioBuffer.length / Int16Array.BYTES_PER_ELEMENT
          );
          audioChunks.push(...Array.from(intArray));
        }
        if (response?.serverContent?.turnComplete) {
          done = true;
          console.debug("Received complete response");
        }
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    const audioFilePath = path.join("../assets/live", `audio_response_${this.turnIndex++}.wav`);
    saveAudioToFile(new Int16Array(audioChunks), audioFilePath);
    tslab.display.html(`
      <h3>Audio Response ${this.turnIndex}</h3>
      <audio controls>
          <source src="../assets/live/audio_response_${this.turnIndex - 1}.wav" type="audio/wav">
          Your browser does not support the audio element.
      </audio>
    `);
  }

  stop() {
    this.session.close();
    console.debug("Session closed");
  }
}

async function asyncAudioLooper() {
  const audioLooper = new AudioLooper(ai, MODEL_ID);
  await audioLooper.start();

  // Simulate sending messages
  const messages = ["Hello? Gemini are you there?", "Can you tell me a joke?", "What is the weather like today?"];

  for (const message of messages) {
    audioLooper.send(message);
    await audioLooper.receive();
  }

  audioLooper.stop();
}

await asyncAudioLooper();

Opened
Sent message: Hello? Gemini are you there?
Received complete response
Audio saved to ../assets/live/audio_response_0.wav

Audio Response 1

Sent message: Can you tell me a joke?
Received complete response
Audio saved to ../assets/live/audio_response_1.wav

Audio Response 2

Sent message: What is the weather like today?
Received complete response
Audio saved to ../assets/live/audio_response_2.wav

Audio Response 3

Session closed
Close:

The above code is divided into several sections:

start: Initializes the client and sets up the WebSocket connection.
send: Sends a message to the model.
receive: Receives the model’s response and collects the audio chunks in a loop and writes them to wav file. It breaks when the model indicates it has finished sending the response.
asyncAudioLooper: This is the main driver function that brings everything together. It initializes the client, starts the WebSocket connection, and then enters a loop where it sends messages and receives responses.

Working with resumable sessions

Session resumption allows you to return to a previous interaction with the Live API by sending the last session handle you got from the previous session.

When you set your session to be resumable, the session information keeps stored on the Live API for up to 24 hours. In this time window, you can resume the conversation and refer to previous information you have shared with the model.

import { LiveServerMessage, Modality } from "@google/genai";

let HANDLE: string | undefined = undefined;

async function resumable_session(
  previousSessionHandle?: string,
  messages: string[] = ["Hello", "What is the capital of Brazil?"]
) {
  const responseQueue: LiveServerMessage[] = [];

  async function waitMessage(): Promise<LiveServerMessage> {
    let done = false;
    let message: LiveServerMessage | undefined = undefined;
    while (!done) {
      message = responseQueue.shift();
      if (message) {
        done = true;
      } else {
        await new Promise((resolve) => setTimeout(resolve, 100));
      }
    }
    return message!;
  }

  console.debug("Connecting to the service with handle %s...", previousSessionHandle);
  const session = await ai.live.connect({
    model: MODEL_ID,
    callbacks: {
      onopen: function () {
        console.debug("Opened");
      },
      onmessage: function (message) {
        responseQueue.push(message);
        console.debug("Received message:", JSON.stringify(message));
        if (message.sessionResumptionUpdate?.resumable && message.sessionResumptionUpdate.newHandle) {
          HANDLE = message.sessionResumptionUpdate.newHandle;
        }
      },
      onerror: function (e) {
        console.debug("Error:", e.message);
      },
      onclose: function (e) {
        console.debug("Close:", e.reason);
      },
    },
    config: {
      responseModalities: [Modality.TEXT],
      sessionResumption: { handle: previousSessionHandle },
    },
  });

  for (const message of messages) {
    console.debug("Sending message:", message);
    session.sendClientContent({
      turns: message,
      turnComplete: true,
    });
    let done = false;
    while (!done) {
      const response = await waitMessage();
      if (response.serverContent?.turnComplete) {
        done = true;
      }
    }
  }

  // small delay for session resumption update to arrive
  await new Promise((resolve) => setTimeout(resolve, 3000));

  session.close();
}

await resumable_session();

Connecting to the service with handle undefined...
Opened
Sending message: Hello
Received message: {"setupComplete":{}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"Hello there! How"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" can I help you today?\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":9,"responseTokenCount":11,"totalTokenCount":20,"promptTokensDetails":[{"modality":"TEXT","tokenCount":9}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":11}]}}
Sending message: What is the capital of Brazil?
Received message: {"sessionResumptionUpdate":{"newHandle":"CihqdTFxaG1ua2g2aTkweWtiNzB5Ymdzc3V0bW16eDE2ZGkxaXR2d2dt","resumable":true}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"The capital of Brazil"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" is **Brasília**.\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":37,"responseTokenCount":10,"totalTokenCount":47,"promptTokensDetails":[{"modality":"TEXT","tokenCount":37}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":10}]}}
Received message: {"sessionResumptionUpdate":{"newHandle":"CihrNGZyMjh4dXY3cXFkYzVmMjR5cnlmZ2w5bnBvNTRhcmoxNW1lN2Fi","resumable":true}}
Close:

With the session resumption you have the session handle to refer to your previous sessions. In this example, the handle is saved at the handle variable as below:

console.debug("Session handle:", HANDLE);

Session handle: CihrNGZyMjh4dXY3cXFkYzVmMjR5cnlmZ2w5bnBvNTRhcmoxNW1lN2Fi

Now you can start a new Live API session, but this time pointing to a handle from a previous session. Also, to test you could gather information from the previous session, you will ask the model what was the second question you asked before (in this example, it was “what is the capital of Brazil?”). You can see the Live API recovering that information:

await resumable_session(HANDLE, ["what was the last question I asked?"]);

Connecting to the service with handle CihrNGZyMjh4dXY3cXFkYzVmMjR5cnlmZ2w5bnBvNTRhcmoxNW1lN2Fi...
Opened
Sending message: what was the last question I asked?
Received message: {"setupComplete":{}}
Received message: {"sessionResumptionUpdate":{}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":"The"}]}}}
Received message: {"serverContent":{"modelTurn":{"parts":[{"text":" last question you asked was: \"What is the capital of Brazil?\"\n"}]}}}
Received message: {"serverContent":{"generationComplete":true}}
Received message: {"serverContent":{"turnComplete":true},"usageMetadata":{"promptTokenCount":65,"responseTokenCount":16,"totalTokenCount":81,"promptTokensDetails":[{"modality":"TEXT","tokenCount":65}],"responseTokensDetails":[{"modality":"TEXT","tokenCount":16}]}}
Received message: {"sessionResumptionUpdate":{"newHandle":"CihmcW04ZzVnZnZwczU2ZnkwN2h1NHpmajFxZmgwcmhieTZ3Zmo3OWt6","resumable":true}}
Close:

Next steps

This tutorial just shows basic usage of the Live API, using the Python GenAI SDK.

If you aren’t looking for code, and just want to try multimedia streaming use Live API in Google AI Studio.
If you’re interested in the low level details of using the websockets directly, see the websocket version of this tutorial.
Try the Tool use in the live API tutorial for an walkthrough of Gemini-2’s new tool use capabilities.
There is a Streaming audio in Colab example, but this is more of a demo, it’s not optimized for readability.
Other nice Gemini 2.0 examples can also be found in the Cookbook’s 2.0 directory, in particular the video understanding and the spatial understanding ones.