Get started with Music generation using Lyria RealTime

Lyria RealTime, provides access to a state-of-the-art, real-time, streaming music generation model. It allows developers to build applications where users can interactively create, continuously steer, and perform instrumental music using text prompts.

Lyria RealTime main characteristics are:

Check Lyria RealTime’s documentation for more details.

Important

Lyria RealTime is a preview feature. It is free to use for now with quota limitations, but is subject to change.

Also note that due to Colab limitation, you won’t be able to experience the real time capabilities of Lyria RealTime but only limited audio output. Use the AI studio’s apps, Prompt DJ and MIDI DJ to fully experience Lyria RealTime

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── Get_started_LyriaRealTime.ipynb

Initialize SDK Client

Lyria RealTime API is a new capability introduced with the Lyria RealTime model so only works with the lyria-realtime-exp model. As it’s an experimental feature, you also need to use the v1alpha client version.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY, httpOptions: { apiVersion: "v1alpha" } });

Select a model

Multimodal Live API are a new capability introduced with the Gemini 2.0 model. It won’t work with previous generation models.

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "models/lyria-realtime-exp";

Utilites

You’re going to use the Lyria Realtime’s audio output, the easiest way hear it in Colab is to write the PCM data out as a WAV file:

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");
const wave = require("wavefile") as typeof import("wavefile");

function saveAudioToFile(audioData: Int16Array, filePath: string) {
  fs.mkdirSync(path.dirname(filePath), { recursive: true });
  const wav = new wave.WaveFile();
  wav.fromScratch(2, 48000, "16", audioData);
  fs.writeFileSync(filePath, wav.toBuffer());
  console.debug(`Audio saved to ${filePath}`);
}

Generate music

The Lyria Realtime model utilizes websockets to stream audio data in real time. The model can be prompted with text descriptions of the desired music, and it will generate audio that matches the description and stream it in chunks. It takes 2 different configuration parameters as input:

  • WeightedPrompt: A list of text prompts that describe the desired music. Each prompt can have a weight that indicates its influence on the generated music. The prompts can be sent while the session is active, allowing for continuous steering of the music generation.
  • LiveMusicGenerationConfig: A configuration object that specifies the desired characteristics of the generated music, such as bpm, density, brightness, scale, and guidance. These parameters can be adjusted in real time to influence the music generation.
Important

You can’t just update a single parameter in the LiveMusicGenerationConfig object. You need to send the entire object with all the parameters each time you want to update it, otherwise the other parameters will be reset to their default values.

Any updates to bpm or scale need to be followed by a resetContext call to reset the context of the music generation. This is because these parameters affect the musical structure and need to be applied from the beginning of the generation.

import { LiveMusicGenerationConfig, LiveMusicSession, LiveMusicServerMessage, WeightedPrompt } from "@google/genai";

let n_index = 0;
const MAX_CHUNKS = 10; // Maximum number of audio chunks to process
const responseQueue: LiveMusicServerMessage[] = [];

async function receive() {
  console.debug("Receiving audio chunks...");
  let done = false;
  let chunk_count = 0;
  const audioChunks: number[][] = [];
  while (!done) {
    if (responseQueue.length > 0) {
      const response = responseQueue.shift();
      if (response?.audioChunk?.data) {
        const audioBuffer = Buffer.from(response.audioChunk.data, "base64");
        const intArray = new Int16Array(
          audioBuffer.buffer,
          audioBuffer.byteOffset,
          audioBuffer.length / Int16Array.BYTES_PER_ELEMENT
        );
        audioChunks.push(Array.from(intArray));
        chunk_count++;
      }
      if (chunk_count >= MAX_CHUNKS) {
        done = true;
        console.debug("Received complete response");
      }
    } else {
      await new Promise((resolve) => setTimeout(resolve, 100));
    }
  }
  const audioFilePath = path.join("../assets/live", `lyria_realtime_${n_index}.wav`);
  saveAudioToFile(new Int16Array(audioChunks.flat()), audioFilePath);
  tslab.display.html(`
    <h3>Audio Response Lyria</h3>
    <audio controls>
        <source src="../assets/live/lyria_realtime_${n_index}.wav" type="audio/wav">
        Your browser does not support the audio element.
    </audio>
  `);
  n_index++;
}

async function generateMusic(prompts: WeightedPrompt[], config: LiveMusicGenerationConfig) {
  const session: LiveMusicSession = await ai.live.music.connect({
    model: MODEL_ID,
    callbacks: {
      onmessage: (message) => {
        responseQueue.push(message);
      },
      onerror: (error) => {
        console.error("music session error:", error);
      },
      onclose: () => {
        console.log("Lyria RealTime stream closed.");
      },
    },
  });

  await session.setWeightedPrompts({
    weightedPrompts: prompts,
  });
  await session.setMusicGenerationConfig({
    musicGenerationConfig: config,
  });

  console.debug("Lyria Realtime session started");
  session.play();
  await receive();
  session.close();
  console.debug("Lyria Realtime session closed");
}

Audio Generation Function

The above code sample shows how to generate music using the Lyria Realtime model. There are two methods worth noting:

generateMusic - Driver method

This method is used to start the music generation process. It takes an array of WeightedPrompt objects and a LiveMusicGenerationConfig object as input. It returns a LiveMusicGenerationSession object that can be used to interact with the music generation session.

This method: - Opens a websocket connection to the Lyria Realtime model. - Sends the initial prompts to the model using setWeightedPrompts, which sets the initial musical influences. - Sends the initial configuration using setLiveMusicGenerationConfig, which sets the desired characteristics of the generated music. - Sets up event listeners to handle incoming audio data and errors and start the audio playback.

receive - Audio data handler

This methods is used to handle incoming audio data from the Lyria Realtime model. It monitors the responseQueue for incoming audio data and collects it in a buffer. When the buffer reaches a certain size, it writes the audio data to a WAV file and plays it back using the saveAudioToFile utility function.

Note

Currently once the receive method is called, it blocks further function execution till required number of chunks are met. This means that you won’t be able to send new prompts or configuration updates while the receive method is running. Ideally, in a real-time application, you would want to run the receive method in a separate thread while also having a send method to send new prompts and configuration updates.

Try Lyria Realtime

Because of Colab limitation you won’t be able to experience the “real time” part of Lyria RealTime, so all those examples are going to be one-offs prompt to get an audio file.

One thing to note is that the audio will only be played at the end of the session when all would have been written in the wav file. When using the API for real you’ll be able to start plyaing as soon as the first chunk arrives. So the longer the duration (using the dedicated parameter) you set, the longer you’ll have to wait until you hear something.

Simple Lyria RealTime example

Here’s first a simple example:

await generateMusic(
  [
    {
      text: "piano",
      weight: 1.0,
    },
  ],
  { bpm: 120, density: 1.0 }
);
Live music generation is experimental and may change in future versions.
Lyria Realtime session started
Receiving audio chunks...
Received complete response
Audio saved to ../assets/live/lyria_realtime_0.wav

Audio Response Lyria

Lyria Realtime session closed

Try Lyria RealTime by yourself

Now you can try mixing multiple prompts, and tinkering with the music configuration.

The prompts needs to follow their specific format which is a list of prompts with weights (which can be any values, including negative, except 0) like this:

{
    "text": "Text of the prompt",
    "weight": 1.0
}

You should try to stay simple (unlike when you’re using image-out) as the model will better understand things like “meditation”, “eerie”, “harp” than “An eerie and relaxing music illustrating the verdoyant forests of Scotland using string instruments”.

The music configuration options available to you are:

  • bpm: beats per minute
  • guidance: how strictly the model follows the prompts
  • density: density of musical notes/sounds
  • brightness: tonal quality
  • scale: musical scale (key and mode)

Other options are available (mute_bass for ex.). Check the documentation for the full list.

Select one of the sample prompts (genres, instruments and mood), or write your owns. Check the documentation for more details and prompt examples.

await generateMusic(
  [
    {
      text: "Indie Pop",
      weight: 0.6,
    },
    {
      text: "Sitar",
      weight: 2,
    },
    {
      text: "Danceable",
      weight: 1.4,
    },
  ],
  {
    bpm: 140,
    scale: google.Scale.F_MAJOR_D_MINOR,
    density: 0.2,
    brightness: 0.7,
    guidance: 4.0,
  }
);
Live music generation is experimental and may change in future versions.
Lyria RealTime stream closed.
Lyria Realtime session started
Receiving audio chunks...
Received complete response
Audio saved to ../assets/live/lyria_realtime_1.wav

Audio Response Lyria

Lyria Realtime session closed
Lyria RealTime stream closed.

What’s next?

Now that you know how to generate music, here are other cool things to try:

  • Instead of music, learn how to generate multi-speakers conversation using the TTS models.
  • Discover how to generate images or videos.
  • Instead of generation music or audio, find out how to Gemini can understand Audio files.
  • Have a real-time conversation with Gemini using the Live API.