Get started with Music generation using Lyria RealTime
Lyria RealTime, provides access to a state-of-the-art, real-time, streaming music generation model. It allows developers to build applications where users can interactively create, continuously steer, and perform instrumental music using text prompts.
Lyria RealTime main characteristics are:
Highest quality text-to-audio model: Lyria RealTime generates high-quality instrumental music (no voice) using the latest models produced by DeepMind.
Non-stopping music: Using websockets, Lyria RealTime continuously generates music in real time.
Mix and match influences: Prompt the model to describe musical idea, genre, instrument, mood, or characteristic. The prompts can be mixed to blend influences and create unique compositions.
Creative control: Set the guidance, the bpm, the density of musical notes/sounds, the brightness and the scale in real time. The model will smoothly transition based on the new input.
Check Lyria RealTime’s documentation for more details.
Important
Lyria RealTime is a preview feature. It is free to use for now with quota limitations, but is subject to change.
Also note that due to Colab limitation, you won’t be able to experience the real time capabilities of Lyria RealTime but only limited audio output. Use the AI studio’s apps, Prompt DJ and MIDI DJ to fully experience Lyria RealTime
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
Lyria RealTime API is a new capability introduced with the Lyria RealTime model so only works with the lyria-realtime-exp model. As it’s an experimental feature, you also need to use the v1alpha client version.
const google =require("@google/genai") astypeofimport("@google/genai");const ai =new google.GoogleGenAI({ apiKey: GEMINI_API_KEY, httpOptions: { apiVersion:"v1alpha" } });
Select a model
Multimodal Live API are a new capability introduced with the Gemini 2.0 model. It won’t work with previous generation models.
The Lyria Realtime model utilizes websockets to stream audio data in real time. The model can be prompted with text descriptions of the desired music, and it will generate audio that matches the description and stream it in chunks. It takes 2 different configuration parameters as input:
WeightedPrompt: A list of text prompts that describe the desired music. Each prompt can have a weight that indicates its influence on the generated music. The prompts can be sent while the session is active, allowing for continuous steering of the music generation.
LiveMusicGenerationConfig: A configuration object that specifies the desired characteristics of the generated music, such as bpm, density, brightness, scale, and guidance. These parameters can be adjusted in real time to influence the music generation.
Important
You can’t just update a single parameter in the LiveMusicGenerationConfig object. You need to send the entire object with all the parameters each time you want to update it, otherwise the other parameters will be reset to their default values.
Any updates to bpm or scale need to be followed by a resetContext call to reset the context of the music generation. This is because these parameters affect the musical structure and need to be applied from the beginning of the generation.
The above code sample shows how to generate music using the Lyria Realtime model. There are two methods worth noting:
generateMusic - Driver method
This method is used to start the music generation process. It takes an array of WeightedPrompt objects and a LiveMusicGenerationConfig object as input. It returns a LiveMusicGenerationSession object that can be used to interact with the music generation session.
This method: - Opens a websocket connection to the Lyria Realtime model. - Sends the initial prompts to the model using setWeightedPrompts, which sets the initial musical influences. - Sends the initial configuration using setLiveMusicGenerationConfig, which sets the desired characteristics of the generated music. - Sets up event listeners to handle incoming audio data and errors and start the audio playback.
receive - Audio data handler
This methods is used to handle incoming audio data from the Lyria Realtime model. It monitors the responseQueue for incoming audio data and collects it in a buffer. When the buffer reaches a certain size, it writes the audio data to a WAV file and plays it back using the saveAudioToFile utility function.
Note
Currently once the receive method is called, it blocks further function execution till required number of chunks are met. This means that you won’t be able to send new prompts or configuration updates while the receive method is running. Ideally, in a real-time application, you would want to run the receive method in a separate thread while also having a send method to send new prompts and configuration updates.
Try Lyria Realtime
Because of Colab limitation you won’t be able to experience the “real time” part of Lyria RealTime, so all those examples are going to be one-offs prompt to get an audio file.
One thing to note is that the audio will only be played at the end of the session when all would have been written in the wav file. When using the API for real you’ll be able to start plyaing as soon as the first chunk arrives. So the longer the duration (using the dedicated parameter) you set, the longer you’ll have to wait until you hear something.
Live music generation is experimental and may change in future versions.
Lyria Realtime session started
Receiving audio chunks...
Received complete response
Audio saved to ../assets/live/lyria_realtime_0.wav
Audio Response Lyria
Lyria Realtime session closed
Try Lyria RealTime by yourself
Now you can try mixing multiple prompts, and tinkering with the music configuration.
The prompts needs to follow their specific format which is a list of prompts with weights (which can be any values, including negative, except 0) like this:
{"text":"Text of the prompt","weight":1.0}
You should try to stay simple (unlike when you’re using image-out) as the model will better understand things like “meditation”, “eerie”, “harp” than “An eerie and relaxing music illustrating the verdoyant forests of Scotland using string instruments”.
The music configuration options available to you are:
bpm: beats per minute
guidance: how strictly the model follows the prompts
density: density of musical notes/sounds
brightness: tonal quality
scale: musical scale (key and mode)
Other options are available (mute_bass for ex.). Check the documentation for the full list.
Select one of the sample prompts (genres, instruments and mood), or write your owns. Check the documentation for more details and prompt examples.