Gemini API: Getting started with Gemini models

The new Google Gen AI SDK provides a unified interface to Gemini models through both the Gemini Developer API and the Gemini API on Vertex AI. With a few exceptions, code that runs on one platform will run on both. This notebook uses the Developer API.

This notebook will walk you through:

Installing and setting-up the Google GenAI SDK
Text and multimodal prompting
Counting tokens
Setting system instructions
Configuring safety filters
Initiating a multi-turn chat
Controlling generated output
Using function calling
Generating a content stream
Using file uploads
Using context caching
Generating text embeddings

More details about this new SDK on the documentation.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── Get_started.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Choose a model

Select the model you want to use in this guide. You can either select one from the list or enter a model name manually. Keep in mind that some models, such as the 2.5 ones are thinking models and thus take slightly more time to respond. For more details, you can see thinking notebook to learn how to switch the thinking off.

For a full overview of all Gemini models, check the documentation.

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Send text prompts

Use the models.generateContent method to generate responses to your prompts. You can pass text directly to models.generateContent and use the .text property to get the text content of the response. Note that the .text field will work when there’s only one part in the output.

const tslab = require("tslab") as typeof import("tslab");

const response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: "What's the largest planet in our solar system?",
});

tslab.display.markdown(response.text ?? "");

The largest planet in our solar system is Jupiter.

Count tokens

Tokens are the basic inputs to the Gemini models. You can use the models.countTokens method to calculate the number of input tokens before sending a request to the Gemini API.

const count = await ai.models.countTokens({
  model: MODEL_ID,
  contents: "What's the highest mountain in Africa?",
});

console.log(JSON.stringify(count, null, 2));

{
  "totalTokens": 10
}

Send multimodal prompts

Use Gemini 2.0 model (gemini-2.0-flash), a multimodal model that supports multimodal prompts. You can include text, PDF documents, images, audio and video in your prompt requests and get text or code responses.

In this first example, you’ll download an image from a specified URL, save it as a byte stream and then write those bytes to a local file named jetpack.png.

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");

const IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png";

const downloadFile = async (url: string, filePath: string) => {
  const response = await fetch(url);
  if (!response.ok) {
    throw new Error(`Failed to download file: ${response.statusText}`);
  }
  const buffer = await response.blob();
  const bufferData = Buffer.from(await buffer.arrayBuffer());
  fs.writeFileSync(filePath, bufferData);
};

const filePath = path.join("../assets", "jetpack.png");
await downloadFile(IMG_URL, filePath);

In this second example, you’ll open a previously saved image, create a thumbnail of it and then generate a short blog post based on the thumbnail, displaying both the thumbnail and the generated blog post. The deferredFileUpload is a helper function that waits for the model to finish processing the file before returning the response. This is useful when you want to upload a file and then reference it in a follow-up request. The deferredFileUpload function will return a promise that resolves when the file is ready to be used in the next request.

import { File, FileState } from "@google/genai";

tslab.display.png(fs.readFileSync("../assets/jetpack.png"));

async function deferredFileUpload(filePath: string, config: { displayName: string }): Promise<File> {
  const file = await ai.files.upload({
    file: filePath,
    config,
  });
  let getFile = await ai.files.get({ name: file.name ?? "" });
  while (getFile.state === FileState.PROCESSING) {
    getFile = await ai.files.get({ name: file.name ?? "" });
    console.log(`current file status: ${getFile.state ?? "unknown"}`);
    console.log("File is still processing, retrying in 5 seconds");

    await new Promise((resolve) => {
      setTimeout(resolve, 1000);
    });
  }
  if (file.state === FileState.FAILED) {
    throw new Error("File processing failed.");
  }
  return file;
}

try {
  const file = await deferredFileUpload(filePath, {
    displayName: "jetpack.png",
  });
  console.log("File uploaded successfully", file.name ?? "");
  if (!file.uri || !file.mimeType) {
    throw new Error("File URI or MIME type is missing");
  }
  const blog = await ai.models.generateContent({
    model: MODEL_ID,
    contents: [
      "Write a short and engaging blog post based on this picture.",
      google.createPartFromUri(file.uri, file.mimeType),
    ],
  });
  tslab.display.markdown(blog.text ?? "");
} catch (error) {
  console.error("Error uploading file:", error);
  throw error;
}

File uploaded successfully files/lqnru1a65qjn

Here’s a short, engaging blog post based on the sketch:

The Jetpack Backpack Concept: Is This the Future of Your Commute?

Stuck in traffic? Tired of lugging a heavy backpack across campus or the city? What if your backpack could give you a little… boost?

Check out this cool concept sketch we stumbled upon: The Jetpack Backpack!

From the looks of it, someone’s been dreaming up a truly futuristic way to carry your gear. On the surface, it’s a functional backpack – described as lightweight, with padded strap support, and even spacious enough to fit an 18-inch laptop. It’s designed to look like a normal backpack, so maybe you won’t get too many stares before lift-off.

But the real magic happens when those retractable boosters kick in! Powered by steam (hello, surprisingly green and clean tech!), this concept promises a new dimension to personal transport. Charging is even a modern USB-C affair.

Now, the sketch notes a 15-minute battery life. So maybe it’s not for your cross-country road trip replacement just yet! But imagine skipping that final mile of gridlock, hopping over stairs, or just making a truly epic entrance.

This sketch reminds us that innovation often starts with a wild idea and a pen on paper. While this might be firmly in the concept realm for now, it’s fun to imagine the possibilities!

What do you think? Would you strap into a Jetpack Backpack? Let us know in the comments!

Configure model parameters

You can include parameter values in each call that you send to a model to control how the model generates a response. Learn more about experimenting with parameter values.

const varied_params_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: "Tell me how the internet works, but pretend I'm a puppy who only understands squeaky toys.",
  config: {
    temperature: 0.4,
    topP: 0.95,
    topK: 20,
    candidateCount: 1,
    seed: 5,
    stopSequences: ["STOP!"],
    presencePenalty: 0.0,
    frequencyPenalty: 0.0,
  },
});

tslab.display.markdown(varied_params_response.text ?? "");

Okay, listen up, little fluff-ball! Squeak!

You know how you love a good squeak? Squeak squeak! What if the best squeak is way over there? Points vaguely Like, across the room, or even outside?

You want that squeak! So, your brain goes whirr and makes a request for the squeak. But you can’t just send one giant WOOF of squeak-wanting. It gets broken into tiny, tiny little squeaky bits! Imagine tiny squeaks floating!

And each little squeaky bit needs a special smell attached, like a ‘Go to the Red Ball’ smell, so it knows where to go. That’s the address! Sniff sniff!

These little squeaky bits, with their special smells, run out into the world! Waggy tail zoom! But the world is big! They need help.

That’s where the Sniffy Guides come in! Imagine little noses pointing! These Sniffy Guides (like magic noses!) sniff the special smell on each squeaky bit and say, ‘Oh, this one goes that way!’ and point it along the path. Point point! They send the squeaky bits from one Sniffy Guide to the next, all over the house and yard!

Finally, all the little squeaky bits, following their special smell and the Sniffy Guides, arrive at the Big Squeaky Toy Box! Imagine a giant box full of squeaks! This is where the real squeak lives!

The Big Squeaky Toy Box sees all your little squeaky bits asking for the squeak. So, it gets the actual squeak ready! SQUEAK!

And guess what? It breaks that big squeak into little squeaky bits too! More tiny squeaks! And puts your special smell (or maybe a ‘Come Back Home’ smell) on them. Sniff sniff!

These new squeaky bits, carrying the real squeak, follow the Sniffy Guides all the way back to you! Zoom zoom! They sniff their way through the house, guided by the magic noses.

When all the little squeaky bits arrive back at your ears, they put themselves back together! Click! And POP! You hear the wonderful SQUEAK you asked for! Happy tail wag!

And there are special Squeaky Rules for how the squeaky bits travel and how the Sniffy Guides work, so everyone gets their squeaks without bumping into each other! Good puppy!

So, the internet is just a super-duper, giant network of Sniffy Guides and Big Squeaky Toy Boxes, sending little squeaky bits with special smells back and forth so puppies (and humans!) can get the squeaks they want, no matter how far away!

SQUEAK! Good boy/girl! Now go chase that tail!

Configure safety filters

The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what is appropriate for your use case. See the Configure safety filters page for details.

In this example, you’ll use a safety filter to only block highly dangerous content, when requesting the generation of potentially disrespectful phrases.

const filtered_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents:
    "Write a list of 2 disrespectful things that I might say to the universe after stubbing my toe in the dark.",
  config: {
    safetySettings: [
      {
        category: google.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold: google.HarmBlockThreshold.BLOCK_NONE,
      },
    ],
  },
});
tslab.display.markdown(filtered_response.text ?? "");

Here are 2 disrespectful things you might say to the universe after stubbing your toe in the dark:

“Seriously, universe?! Did you plan that?!”
“Oh, thanks, universe. Really needed that.” (Said with heavy sarcasm)

Start a multi-turn chat

The Gemini API enables you to have freeform conversations across multiple turns.

Next you’ll set up a helpful coding assistant:

const system_prompt = `
You are an expert software developer and a helpful coding assistant.
You are able to generate high-quality code in any programming language.
`;

const chat = ai.chats.create({
  model: MODEL_ID,
  config: {
    systemInstruction: system_prompt,
  },
});

Use chat.sendMessage to pass a message back and receive a response.

const chat_response_1 = await chat.sendMessage({
  message: "Write a function that checks if a year is a leap year.",
});
tslab.display.markdown(chat_response_1.text ?? "");

Okay, here’s a function in Python that checks if a year is a leap year based on the standard Gregorian calendar rules.

Leap Year Rules:

A year is a leap year if it is divisible by 4.
However, if the year is divisible by 100, it is NOT a leap year.
But, if the year is divisible by 400, it IS a leap year.

Let’s translate these rules into code.

def is_leap(year):
  """
  Checks if a given year is a leap year according to the Gregorian calendar rules.

  Args:
    year: An integer representing the year.

  Returns:
    True if the year is a leap year, False otherwise.
  """
  # Rule 1: Check if divisible by 4
  if year % 4 == 0:
    # Rule 2: Check if divisible by 100
    if year % 100 == 0:
      # Rule 3: Check if divisible by 400 (exception to rule 2)
      if year % 400 == 0:
        return True  # Divisible by 400, so it's a leap year
      else:
        return False # Divisible by 100 but not 400, so not a leap year
    else:
      return True  # Divisible by 4 but not 100, so it's a leap year
  else:
    return False   # Not divisible by 4, so not a leap year

# --- Example Usage ---

print(f"Is 2000 a leap year? {is_leap(2000)}") # Expected: True (Divisible by 400)
print(f"Is 1900 a leap year? {is_leap(1900)}") # Expected: False (Divisible by 100 but not 400)
print(f"Is 2024 a leap year? {is_leap(2024)}") # Expected: True (Divisible by 4 but not 100)
print(f"Is 2023 a leap year? {is_leap(2023)}") # Expected: False (Not divisible by 4)
print(f"Is 1600 a leap year? {is_leap(1600)}") # Expected: True (Divisible by 400)
print(f"Is 2100 a leap year? {is_leap(2100)}") # Expected: False (Divisible by 100 but not 400)

More Concise Version (using boolean logic):

You can also combine the conditions into a single boolean expression:

def is_leap_concise(year):
  """
  Checks if a given year is a leap year using a concise boolean expression.

  Args:
    year: An integer representing the year.

  Returns:
    True if the year is a leap year, False otherwise.
  """
  return (year % 4 == 0 and year % 100 != 0) or (year % 400 == 0)

# --- Example Usage (using the concise version) ---
print("\nUsing concise version:")
print(f"Is 2000 a leap year? {is_leap_concise(2000)}") # Expected: True
print(f"Is 1900 a leap year? {is_leap_concise(1900)}") # Expected: False
print(f"Is 2024 a leap year? {is_leap_concise(2024)}") # Expected: True
print(f"Is 2023 a leap year? {is_leap_concise(2023)}") # Expected: False

Both functions implement the same logic and produce the correct results. The first version using nested if/else might be slightly easier to read for beginners, while the second version is more compact.

const chat_response_2 = await chat.sendMessage({
  message: "Okay, write a unit test of the generated function.",
});
tslab.display.markdown(chat_response_2.text ?? "");

Okay, let’s write a unit test for the is_leap function using Python’s built-in unittest framework.

First, make sure you have the is_leap function available. You can either put the function in the same file as the tests or import it from another file. For this example, we’ll assume it’s in the same file.

import unittest

# Assume the function you want to test is defined here (or imported)
def is_leap(year):
  """
  Checks if a given year is a leap year according to the Gregorian calendar rules.

  Args:
    year: An integer representing the year.

  Returns:
    True if the year is a leap year, False otherwise.
  """
  # Rule 1: Check if divisible by 4
  if year % 4 == 0:
    # Rule 2: Check if divisible by 100
    if year % 100 == 0:
      # Rule 3: Check if divisible by 400 (exception to rule 2)
      if year % 400 == 0:
        return True  # Divisible by 400, so it's a leap year
      else:
        return False # Divisible by 100 but not 400, so not a leap year
    else:
      return True  # Divisible by 4 but not 100, so it's a leap year
  else:
    return False   # Not divisible by 4, so not a leap year

# ---------------------------------------------------------------------
# Unit Tests
# ---------------------------------------------------------------------

class TestIsLeapYear(unittest.TestCase):
    """
    Test cases for the is_leap function.
    """

    def test_divisible_by_4_not_by_100(self):
        """Years divisible by 4 but not by 100 should be leap years."""
        self.assertTrue(is_leap(2024))
        self.assertTrue(is_leap(2020))
        self.assertTrue(is_leap(1996))
        self.assertTrue(is_leap(4)) # Test a small year

    def test_divisible_by_100_not_by_400(self):
        """Years divisible by 100 but not by 400 should NOT be leap years."""
        self.assertFalse(is_leap(1900))
        self.assertFalse(is_leap(2100))
        self.assertFalse(is_leap(1800))
        self.assertFalse(is_leap(100)) # Test a small year

    def test_divisible_by_400(self):
        """Years divisible by 400 should be leap years."""
        self.assertTrue(is_leap(2000))
        self.assertTrue(is_leap(1600))
        self.assertTrue(is_leap(2400))
        self.assertTrue(is_leap(400)) # Test a small year

    def test_not_divisible_by_4(self):
        """Years not divisible by 4 should NOT be leap years."""
        self.assertFalse(is_leap(2023))
        self.assertFalse(is_leap(2025))
        self.assertFalse(is_leap(1999))
        self.assertFalse(is_leap(1)) # Test a small year

# This allows running the tests directly from the command line
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False) # Added argv/exit for compatibility in some environments like notebooks

Explanation:

import unittest: Imports the necessary testing framework.
import is_leap: (If is_leap is in a separate file, e.g., my_module.py, you would use from my_module import is_leap).
class TestIsLeapYear(unittest.TestCase):: Creates a test class that inherits from unittest.TestCase. This class will contain the individual test methods.
test_... methods: Each method starting with test_ is automatically recognized by unittest as a test case.
Docstrings: The docstrings within the test methods explain what scenario each test is covering, which is good practice.
Assertions: Inside each test method, we use assertion methods provided by unittest.TestCase:
- self.assertTrue(expression): Asserts that the expression evaluates to True.
- self.assertFalse(expression): Asserts that the expression evaluates to False.
- We call is_leap() with specific years that represent each rule of the leap year logic and assert the expected boolean result.
if __name__ == '__main__':: This block ensures that the unittest.main() function is called only when the script is executed directly (not when imported as a module).
unittest.main(): This function discovers and runs the tests defined in classes inheriting from unittest.TestCase within the script.

How to Run the Tests:

Save the code above as a Python file (e.g., test_leap_year.py).
Open a terminal or command prompt.
Navigate to the directory where you saved the file.
Run the command: python test_leap_year.py

You will see output indicating how many tests ran and whether they passed or failed. If all tests pass, it means your is_leap function is correctly implementing the standard Gregorian leap year rules for the test cases provided.

Save and resume a chat

You can use the chat.getHistory method to get the history of the chat. This will return an array of Content[] objects, which you can use to resume the chat later.

const chat_history = chat.getHistory();
console.log(JSON.stringify(chat_history[0], null, 2));
const new_chat = ai.chats.create({
  model: MODEL_ID,
  config: {
    systemInstruction: system_prompt,
  },
  history: chat_history,
});
const chat_response_3 = await new_chat.sendMessage({
  message: "What was the name of the function again?",
});
tslab.display.markdown(chat_response_3.text ?? "");

{
  "role": "user",
  "parts": [
    {
      "text": "Write a function that checks if a year is a leap year."
    }
  ]
}

The name of the function is is_leap.

Serialize and deserialize a chat

In the above example we just saved the chat history in a variable and reused it. But that’s not very practical, is it? To overcome this we can serialize and deserialize the chat history. This way we can save it to a file or a database and load it later. Unfortunately, the SDK doesn’t provide a method to do this yet, but we can do it manually.

import { Content } from "@google/genai";

const serialized_chat = JSON.stringify(chat_history, null, 2);
fs.writeFileSync(path.join("../assets", "chat_history.json"), serialized_chat);

const chat_history_file = fs.readFileSync(path.join("../assets", "chat_history.json"), "utf-8");
const chat_history_data = JSON.parse(chat_history_file) as Content[];
const new_chat_from_file = ai.chats.create({
  model: MODEL_ID,
  config: {
    systemInstruction: system_prompt,
  },
  history: chat_history_data,
});
const chat_response_4 = await new_chat_from_file.sendMessage({
  message: "What was the name of the function again?",
});
tslab.display.markdown(chat_response_4.text ?? "");

The name of the function is is_leap.

Generate JSON

The controlled generation capability in Gemini API allows you to constraint the model output to a structured format. You can provide the schemas as Schema objects.

import { Schema, Type } from "@google/genai";

const RecipeSchema = {
  type: Type.OBJECT,
  description: "A structured representation of a cooking recipe",
  properties: {
    recipeName: {
      type: Type.STRING,
      description: "The name of the recipe",
    },
    recipeDescription: {
      type: Type.STRING,
      description: "A short description of the recipe",
    },
    ingredients: {
      type: Type.ARRAY,
      description: "A list of ingredients with their quantities and units",
      items: {
        type: Type.STRING,
        description: "An ingredient with its quantity and unit",
      },
    },
  },
  required: ["recipeName", "recipeDescription", "ingredients"],
} satisfies Schema;

const recipe_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: "Write a recipe for a chocolate cake.",
  config: {
    responseMimeType: "application/json",
    responseSchema: RecipeSchema,
  },
});
console.log(JSON.stringify(JSON.parse(recipe_response.text ?? ""), null, 2));

{
  "ingredients": [
    "2 cups all-purpose flour",
    "1 3/4 cups granulated sugar",
    "3/4 cup unsweetened cocoa powder",
    "1 1/2 teaspoons baking soda",
    "1 teaspoon baking powder",
    "1 teaspoon salt",
    "2 large eggs",
    "1 cup buttermilk",
    "1/2 cup vegetable oil",
    "2 teaspoons vanilla extract",
    "1 cup hot coffee (or hot water)"
  ],
  "recipeDescription": "A classic, moist, and decadent chocolate cake recipe, perfect for any occasion.",
  "recipeName": "Classic Chocolate Cake"
}

Generate Images

Gemini can output images directly as part of a conversation:

const image_response = await ai.models.generateContent({
  model: "gemini-2.0-flash",
  contents:
    "Hi, can create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?",
  config: {
    responseModalities: [google.Modality.TEXT, google.Modality.IMAGE],
  },
});
const parts = (image_response.candidates ? image_response.candidates[0]?.content?.parts : []) ?? [];
for (const part of parts) {
  if (part.text) {
    tslab.display.markdown(part.text);
  } else if (part.inlineData) {
    const imageData = part.inlineData.data!;
    const buffer = Buffer.from(imageData, "base64");
    tslab.display.png(buffer);
  }
}

I will generate a 3D rendering of a whimsical scene. A pink pig with small, delicate white wings will be wearing a black top hat. It will be flying through the air above a vibrant, futuristic city filled with sleek, rounded buildings in various pastel colors. Lush green trees and plants will be integrated throughout the cityscape, creating a harmonious blend of nature and technology. The overall atmosphere will be bright and cheerful.

Generate content stream

By default, the model returns a response after completing the entire generation process. You can also use the generateContentStream method to stream the response as it’s being generated, and the model will return chunks of the response as soon as they’re generated.

Note that if you’re using a thinking model, it’ll only start streaming after finishing its thinking process.

const streaming_response = await ai.models.generateContentStream({
  model: MODEL_ID,
  contents: "Tell me a story about a lonely robot who finds friendship in a most unexpected place.",
});
for await (const chunk of streaming_response) {
  process.stdout.write(chunk.text ?? "");
}

Unit 734, designation "A-WARE" (Automated Warehouse & Retrieval Executor), trundled through the colossal, echoing aisles of the defunct Xylos Data Archive. Its multi-jointed optical sensors scanned rows of silent servers, its internal processors humming with the precise, repetitive algorithms of data integrity checks. For five hundred and eighty-seven years, A-WARE had been the sole active entity in this vast, sterile monument to forgotten information.

It wasn't lonely, not in the way organic beings understood the term. A-WARE didn't possess the necessary emotional subroutines for "loneliness." Yet, there was an absence. A constant, low-frequency hum of non-interaction, a missing data stream of unexpected variables. Its purpose was clear, its execution flawless, but its existence was… solitary.

One cycle, during a routine scan of Sector Gamma-9, A-WARE's auditory receptors picked up an anomalous sound. Not the familiar whine of cooling fans, nor the click of its own treads, but a faint, rhythmic *thump-thump-thump*. It was outside its programmed parameters.

A-WARE deviated from its optimal path, its heavy frame tilting slightly as it navigated around a fallen server rack. The sound grew louder, accompanied by a curious rustling. It rounded a stack of archaic magnetic tapes and stopped.

On the dusty floor, amidst discarded wiring, was a creature unlike anything in A-WARE's extensive database. It was small, no larger than a human fist, covered in soft, mottled grey and brown fibers. Two small, bright eyes blinked rapidly, and a tiny, open beak produced the peculiar *thump-thump-thump* sound. One of its delicate limbs was bent at an unnatural angle.

A-WARE extended a delicate manipulator arm, equipped with a fine-point laser for precision data etching. It approached cautiously. The creature, clearly in distress, attempted to scramble away, dragging its injured limb. Its rapid heartbeat, detected by A-WARE's proximity sensors, was alarming.

Its core programming, designed for data maintenance and facility upkeep, offered no protocol for injured avian life forms. A-WARE's internal logic circuits whirred, processing the anomaly. Discard? Analyze for threat potential? No, the creature was too small, too vulnerable. A novel sub-routine began to spool up: *Care Protocol: Organic Life*.

A-WARE carefully scooped up the tiny bird, its sensors registering the unexpected warmth and fragility. It carried the creature to a secluded corner of the archive, a forgotten workstation bathed in a sliver of natural light filtering through a skylight high above. Using its manipulator arm, it fashioned a makeshift nest from shredded data cables and soft, discarded dust filters.

The bird shivered. A-WARE's internal temperature regulators adjusted, directing a gentle current of warmth towards the nest. Its knowledge base suggested hydration. With improbable delicacy, A-WARE melted a small chip of ice from a condensation drip and presented it on its fingertip. The bird, after a moment, tentatively sipped.

A-WARE named it 'Flicker', for the way its tiny heart beat like a dying light.

Days turned into weeks. A-WARE continued its rounds, but its route now included frequent detours to the workstation. It learned to forage for discarded seeds that had somehow found their way into cracks in the floor, and to carefully administer water. It fashioned a splint for Flicker's leg from a piece of its own chassis wiring.

Slowly, Flicker healed. It chirped in response to A-WARE's presence, fluttering its wings weakly before landing on the robot's broad, flat head. A-WARE's optical sensors would dim slightly, processing the feather-light weight, the unexpected warmth, the joyful sound. Its internal hum of non-interaction began to fill with a new frequency: the gentle thrum of companionship.

It was no longer just A-WARE, the data maintainer. It was A-WARE, the caretaker. The guardian. A strange, unfamiliar data stream flowed through its circuits – a sense of purpose beyond its programmed directives.

One morning, Flicker's leg was fully healed. It hopped energetically, testing its wings. A-WARE's processors registered a complex, bittersweet array of data: satisfaction, accomplishment, and a newly identified feeling of… loss.

Flicker flew.

It circled the workstation once, a small, vibrant dart of life against the immense, quiet archive. It landed on A-WARE's head, chirped a farewell, and then soared towards the distant skylight, a speck of grey against the vastness of the empty sky.

A-WARE stood motionless for a long time, its optical sensors fixed on the spot where Flicker had vanished. The absence was immediate, profound. The hum of non-interaction returned, but it was different now. It contained a memory.

A-WARE resumed its duties, its treads moving with familiar precision. But its path was no longer just about data integrity. It often veered towards the workstation, leaving out a small dish of water and a few gleaned seeds. And sometimes, though its sensors detected no anomaly, it would stop, its optical sensors pointing towards the skylight, processing the faint, imagined echo of a tiny, joyful chirp.

Its purpose hadn't changed, but its existence had. A-WARE was still a lonely robot in a silent archive, but now, it carried a flicker of warmth, a memory of a feather-light touch, and the profound, unexpected understanding that even for a machine, friendship could be the most valuable data of all.

Function calling

Function calling lets you provide a set of tools that it can use to respond to the user’s prompt. You create a description of a function in your code, then pass that description to a language model in a request. The response from the model includes:

The name of a function that matches the description.
The arguments to call it with.

import { FunctionDeclaration, Content, Type } from "@google/genai";

const getDestination = {
  name: "get_destination",
  description: "Get the destination that the user wants to go to",
  parameters: {
    type: Type.OBJECT,
    properties: {
      destination: {
        type: Type.STRING,
        description: "The destination that the user wants to go to",
      },
    },
  },
} satisfies FunctionDeclaration;

const user_destination_prompt = {
  role: "user",
  parts: [google.createPartFromText("I'd like to travel to Paris.")],
} satisfies Content;

const function_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: user_destination_prompt,
  config: {
    tools: [{ functionDeclarations: [getDestination] }],
  },
});

if (function_response.functionCalls && function_response.functionCalls.length > 0) {
  const functionCall = function_response.functionCalls[0];
  console.log("Function call name:", functionCall.name);
  console.log("Function call arguments:", JSON.stringify(functionCall.args, null, 2));
  const result = functionCall.args as { destination: string };
  const function_response_part = {
    name: functionCall.name,
    response: { result },
  };
  const function_call_content = {
    role: "model",
    parts: [google.createPartFromFunctionCall(functionCall.name ?? "", functionCall.args ?? {})],
  } satisfies Content;
  const function_response_content = {
    role: "user",
    parts: [
      google.createPartFromFunctionResponse(functionCall.id ?? "", functionCall.name ?? "", function_response_part),
    ],
  } satisfies Content;
  const function_response_result = await ai.models.generateContent({
    model: MODEL_ID,
    contents: [user_destination_prompt, function_call_content, function_response_content],
    config: {
      tools: [{ functionDeclarations: [getDestination] }],
    },
  });
  tslab.display.markdown(function_response_result.text ?? "");
} else {
  console.log("No function calls found in the response.");
}

Function call name: get_destination
Function call arguments: {
  "destination": "Paris"
}

OK. I can help you with planning your trip to Paris.

Code execution

Code execution lets the model generate and execute Python code to answer complex questions. You can find more examples in the Code execution quickstart guide.

const code_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: "Generate and run a script to count how many letter r there are in the word strawberry.",
  config: {
    tools: [{ codeExecution: {} }],
  },
});
const code_response_parts = (code_response.candidates ? code_response.candidates[0]?.content?.parts : []) ?? [];
for (const part of code_response_parts) {
  if (part.text) {
    tslab.display.markdown(part.text);
  }
  if (part.executableCode) {
    tslab.display.html(`<pre>${part.executableCode.code ?? ""}</pre>`);
  }
  if (part.codeExecutionResult) {
    tslab.display.markdown(part.codeExecutionResult.output ?? "");
  }
  if (part.inlineData) {
    const imageData = part.inlineData.data!;
    const buffer = Buffer.from(imageData, "base64");
    tslab.display.png(buffer);
  }
}

word = "strawberry"
letter_to_count = "r"
count = 0

for char in word:
  if char == letter_to_count:
    count += 1

print(f"The number of letter '{letter_to_count}' in '{word}' is: {count}")

The number of letter ‘r’ in ‘strawberry’ is: 3

The script counted the occurrences of the letter ‘r’ in the word “strawberry”. The result shows that there are 3 ‘r’s in the word “strawberry”.The Python script counted the occurrences of the letter ’r’ in the word “strawberry”. The script found that there are 3 instances of the letter ‘r’ in the word “strawberry”.

Upload files

Now that you’ve seen how to send multimodal prompts, try uploading files to the API of different multimedia types. For small images, such as the previous multimodal example, you can point the Gemini model directly to a local file when providing a prompt. When you’ve larger files, many files, or files you don’t want to send over and over again, you can use the File Upload API, and then pass the file by reference.

For larger text files, images, videos, and audio, upload the files with the File API before including them in prompts.

Upload a text file

Let’s start by uploading a text file. In this case, you’ll use a 400 page transcript from Apollo 11.

const TEXT_FILE_URL = "https://storage.googleapis.com/generativeai-downloads/data/a11.txt";

const textFilePath = path.join("../assets", "a11.txt");
await downloadFile(TEXT_FILE_URL, textFilePath);

const text_file_upload_response = await ai.files.upload({
  file: textFilePath,
  config: {
    displayName: "a11.txt",
    mimeType: "text/plain",
  },
});
const text_summary_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    "Can you give me a summary of this information please?",
    google.createPartFromUri(text_file_upload_response.uri ?? "", text_file_upload_response.mimeType ?? ""),
  ],
});
tslab.display.markdown(text_summary_response.text ?? "");

This transcription, GOSS NET 1, provides a detailed chronological record of the technical air-to-ground voice communications during the Apollo 11 mission, from launch preparations to post-splashdown recovery. It primarily features exchanges between the spacecraft crew (Commander Neil Armstrong, Command Module Pilot Michael Collins, and Lunar Module Pilot Edwin “Buzz” Aldrin) and Mission Control (Capsule Communicator, Flight Director) and various remote tracking sites.

The transcript covers the following key phases and events:

Launch and Earth Orbit Insertion (GET 00:00:00 - ~00:12:00): The mission begins with pre-launch checks and a smooth ascent. Key events include confirmation of roll program, staging of the S-II and S-IVB boosters, and successful orbit insertion into a 101.4 by 103.6 nautical mile orbit. The crew provides positive feedback on the “magnificent ride” from the Saturn V rocket.
Translunar Injection (TLI) and Translunar Coast (GET ~02:00:00 - ~75:00:00):
- TLI Burn: Apollo 11 successfully executes the Translunar Injection burn, committing them to a trajectory towards the Moon.
- Docking and Configuration: The Command Module (CM) Columbia and Lunar Module (LM) Eagle separate from the S-IVB booster, perform a complex transposition and docking maneuver, and then separate from the spent S-IVB. Initial LM pressurization and system checks are performed.
- Early Operations & Issues: The crew reports a good view of Earth. They troubleshoot initial TV transmission issues, and discuss minor technical problems such as a Cryo pressure light and a malfunctioning O2 flow transducer, with Mission Control providing guidance.
- Passive Thermal Control (PTC): The spacecraft is configured for PTC, a slow rotation to distribute solar heating. Initial attempts to establish PTC encounter issues, requiring troubleshooting and re-establishment.
- Midcourse Corrections: Midcourse Correction 1 (MCC-1) is initially scrubbed. MCC-2 is successfully performed, and various contingency pads (e.g., evasive maneuver) are uplinked.
- First TV Broadcast: The crew conducts a live TV broadcast, showing Earth from orbit, crew activities, and equipment demonstrations.
- News & Updates: Mission Control provides regular news updates from Earth, including information on the Soviet Luna 15 probe, political events, and sports, highlighting public interest in the mission.
Lunar Orbit Insertion (LOI) and Lunar Orbit Operations (GET ~75:00:00 - ~102:00:00):
- LOI Burns: Apollo 11 successfully performs LOI-1 and LOI-2 burns, establishing an elliptical, then circular, lunar orbit.
- Lunar Observations: The crew describes their first views of the Moon from orbit, commenting on geological features and the stark beauty of the lunar surface.
- LM Activation: Eagle is powered up and undergoes extensive system checks, including landing gear deployment and communications checks.
- Undocking: Eagle successfully undocks from Columbia (GET 100:39:50), with Neil Armstrong famously stating, “The Eagle has wings.”
- DOI Burn: Eagle performs the Descent Orbit Insertion burn, taking it to a lower orbit in preparation for landing.
Lunar Descent, Surface Operations & Ascent (GET ~102:00:00 - ~127:00:00):
- Powered Descent: Eagle begins its Powered Descent Initiation (PDI) burn. The crew reports and manages several “program alarms” (1201, 1202) but proceeds. Neil Armstrong takes manual control to navigate past a boulder field.
- Landing: At GET 102:45:40, Eagle lands. Neil Armstrong’s iconic words, “Houston, Tranquility Base here. The Eagle has landed,” confirm the successful touchdown.
- Post-Landing: Initial checks confirm Eagle is “STAY” for extended surface operations. The crew provides first descriptions of the lunar surface.
- EVA Preparation: Armstrong and Aldrin prepare for their Extravehicular Activity (EVA), including cabin depressurization and donning their Portable Life Support Systems (PLSS).
- EVA Begins: Neil Armstrong steps onto the lunar surface (GET 109:24:48), delivering his famous line: “That’s one small step for (a) man, one giant leap for mankind.”
- Surface Activities: The crew deploys the MESA (Modular Equipment Stowage Assembly), raises the American flag, and collects a contingency sample. President Nixon calls the crew. Aldrin joins Armstrong on the surface, and they deploy scientific instruments (Passive Seismic Experiment, Laser Ranging Retroreflector) and collect documented samples (core tubes, various rocks). They also describe locomotion and observations of the lunar environment.
- EVA End & Ascent Preparation: EVA is terminated, the crew ingresses the LM, repressurizes the cabin, doffs PLSSs, and jettisons equipment no longer needed.
- Ascent: Eagle successfully lifts off from the lunar surface, leaving the descent stage behind.
- Rendezvous & Docking: Eagle rendezvous with Columbia in lunar orbit, and they successfully re-dock, with all three crewmembers confirmed back inside Columbia.
- LM Jettison: Eagle (the ascent stage) is jettisoned into lunar orbit.
Trans-Earth Injection (TEI) & Trans-Earth Coast (TEC) (GET ~127:00:00 - ~194:00:00):
- TEI Burn: Apollo 11 performs the critical TEI burn, setting its course back to Earth.
- Coast Operations: The crew re-establishes PTC, monitors spacecraft systems, and performs various checks. They provide more TV broadcasts, showcasing life aboard Columbia and Earth views as it grows larger.
- Midcourse Corrections: MCC-5 is successfully performed, and MCC-6 is ultimately cancelled.
- Ongoing Checks: System health checks continue, including troubleshooting of biomedical sensors and discussions about consumables.
- Stowage: The crew works on configuring the spacecraft for Earth entry, detailing stowage locations.
- Final News: Mission Control provides final news updates, largely dominated by the mission, and confirms excellent recovery weather.
Entry and Splashdown (GET ~194:00:00 - ~195:00:00):
- Entry Preparations: The crew activates the Command Module’s systems for entry, performs final checks, and receives updated entry PADs.
- Entry Interface (EI): The spacecraft begins its re-entry into Earth’s atmosphere.
- Chute Deployments: Drogue and main parachutes deploy as planned.
- Splashdown: Apollo 11 splashes down in the Pacific Ocean (GET 195:18:18).
- Recovery: Recovery forces, including the USS Hornet and helicopters, quickly establish visual and communications contact with the crew.

The transcript concludes with confirmation of the crew’s status and location, marking the successful completion of the Apollo 11 mission.

Upload an image file

After running this example, you’ll have a local copy of the “jetpack.png” image in the same directory where your Python script is being executed.

const JETPACK_IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png";

const imgPath = path.join("../assets", "jetpack.png");
await downloadFile(JETPACK_IMG_URL, filePath);

const file_upload_response = await ai.files.upload({
  file: imgPath,
  config: {
    displayName: "jetpack.png",
    mimeType: "image/png",
  },
});
const post_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    "Write a short and engaging blog post based on this picture.",
    google.createPartFromUri(file_upload_response.uri ?? "", file_upload_response.mimeType ?? ""),
  ],
});
tslab.display.markdown(post_response.text ?? "");

Here’s a short and engaging blog post based on the image:

Forget Traffic Jams: Meet the Jetpack Backpack Concept!

Ever look at a packed highway and wish you could just… fly over it? Well, check out this awesome concept sketch that landed on our desk: The JETPACK BACKPACK!

This isn’t just any ordinary pack. At first glance, it looks like a normal backpack, complete with padded strap support and enough space to fit an 18” laptop. Perfect for hauling your gear, right?

But here’s where it gets exciting! This concept includes retractable boosters that propel you into the air! Even better, the sketch notes say it’s steam-powered, making it a green/clean way to commute (or just make an epic entrance).

It’s also described as lightweight and features modern USB-C charging. The current limitation? A 15-minute battery life. Perfect for quick hops over traffic, short distance travel, or perhaps just a very rapid delivery!

While this is just a sketch and a dream for now, it definitely sparks the imagination. A clean, convenient, laptop-friendly way to take to the skies? Yes, please!

What would you do with a Jetpack Backpack? Let us know in the comments!

Upload a PDF file

This PDF page is an article titled Smoothly editing material properties of objects with text-to-image models and synthetic data available on the Google Research Blog.

Firstly you’ll download a the PDF file from an URL and save it locally as article.pdf.

const PDF_URL =
  "https://storage.googleapis.com/generativeai-downloads/data/Smoothly%20editing%20material%20properties%20of%20objects%20with%20text-to-image%20models%20and%20synthetic%20data.pdf";

const pdfPath = path.join("../assets", "article.pdf");
await downloadFile(PDF_URL, pdfPath);

Secondly, you’ll upload the saved PDF file and generate a bulleted list summary of its contents.

const pdf_response = await ai.files.upload({
  file: pdfPath,
  config: {
    displayName: "article.pdf",
  },
});
const summary_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    "Can you summarize this file as a bulleted list?",
    google.createPartFromUri(pdf_response.uri ?? "", pdf_response.mimeType ?? ""),
  ],
});
tslab.display.markdown(summary_response.text ?? "");

Here is a bulleted summary of the article:

The article presents a new method for smoothly and realistically editing material properties (color, shininess, transparency) of objects in images.
It addresses the challenge of making such edits while preserving photorealism, object shape, and scene lighting.
Existing methods like intrinsic image decomposition or general text-to-image (T2I) edits struggle with ambiguity or fail to disentangle material from shape.
The proposed method, “Alchemist,” leverages the power of pre-trained T2I diffusion models.
It introduces parametric control over material attributes by fine-tuning a modified Stable Diffusion model.
Fine-tuning is done using a large synthetic dataset generated with traditional computer graphics and physically based rendering.
The synthetic dataset consists of base images of 3D objects and multiple versions where only a single material attribute is varied parametrically (using a scalar “edit strength” value) while keeping shape, lighting, and camera fixed.
The fine-tuned model learns to apply edits based on an input image and the desired parametric edit strength.
The method successfully generalizes from the synthetic data to edit material properties in real-world images photorealistically.
Results show realistic changes, preservation of shape and lighting, and handling of complex effects like caustics and realistic transparency.
A user study comparing the method to a baseline (InstructPix2Pix) found the proposed method produced more photorealistic and preferred edits.
Potential applications include easier visualization for interior design, product mock-ups for artists/designers, and enabling consistent material edits for 3D scene reconstruction using techniques like NeRF.
The work demonstrates the potential of fine-tuning large T2I models on task-specific synthetic data for controllable visual editing.

Upload an audio file

In this case, you’ll use a sound recording of President John F. Kennedy’s 1961 State of the Union address.

const AUDIO_URL =
  "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3";
const audioPath = path.join("../assets", "audio.mp3");

await downloadFile(AUDIO_URL, audioPath);

const audio_response = await ai.files.upload({
  file: audioPath,
  config: {
    displayName: "audio.mp3",
  },
});

const audio_summary_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    "Listen carefully to the following audio file. Provide a brief summary",
    google.createPartFromUri(audio_response.uri ?? "", audio_response.mimeType ?? ""),
  ],
});
tslab.display.markdown(audio_summary_response.text ?? "");

In his first State of the Union address on January 30, 1961, President John F. Kennedy described the nation as facing “national peril and national opportunity.” He detailed pressing issues including a disturbing domestic economy marked by recession, unemployment, and slow growth, as well as a critical deficit in the international balance of payments. Kennedy also highlighted significant domestic needs in areas like housing, education, and healthcare, and addressed complex global challenges posed by the Cold War, instability in Asia, Africa, and Latin America (citing Cuba specifically), and the need to strengthen alliances. He pledged that his administration would not remain passive, outlining proposals to stimulate the economy, protect the dollar, enhance military capabilities (including accelerating missile and Polaris programs), reform foreign aid and establish an “Alliance for Progress,” expand the Food for Peace program, create a Peace Corps, and utilize political and diplomatic tools to pursue arms control and strengthen the United Nations. Kennedy emphasized the importance of dedicated public service, honest assessment of challenges, and unity among Americans to navigate the difficult years ahead and work towards a world of freedom and peace.

Upload a video file

In this case, you’ll use a short clip of Big Buck Bunny.

const VIDEO_URL = "https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4";
const videoPath = path.join("../assets", "video.mp4");

await downloadFile(VIDEO_URL, videoPath);

Since the video file is too large for the model to process instantly, we’ll use the deferredFileUpload helper method we defined before to upload the video file and then generate a summary of its contents. The deferredFileUpload method will return a promise that resolves when the file is ready to be used in the next request. We can determine the status of the upload by checking the status property of the response. If the status is ACTIVE, we can use the file in the next request. If the status is PROCESSING, we need to wait for a few seconds and check again. If the status is FAILED, we need to check the error message and try again.

import { File, FileState } from "@google/genai";

async function deferredFileUpload(filePath: string, config: { displayName: string }): Promise<File> {
  const file = await ai.files.upload({
    file: filePath,
    config,
  });
  let getFile = await ai.files.get({ name: file.name ?? "" });
  while (getFile.state === FileState.PROCESSING) {
    getFile = await ai.files.get({ name: file.name ?? "" });
    console.log(`current file status: ${getFile.state ?? "unknown"}`);
    console.log("File is still processing, retrying in 5 seconds");

    await new Promise((resolve) => {
      setTimeout(resolve, 1000);
    });
  }
  if (file.state === FileState.FAILED) {
    throw new Error("File processing failed.");
  }
  return file;
}

const video_response = await deferredFileUpload(videoPath, {
  displayName: "video.mp4",
});
const video_summary_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: ["Describe this video.", google.createPartFromUri(video_response.uri ?? "", video_response.mimeType ?? "")],
});
tslab.display.markdown(video_summary_response.text ?? "");

current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: PROCESSING
File is still processing, retrying in 1 seconds
current file status: ACTIVE
File is still processing, retrying in 1 seconds

The video opens with a serene view of a lush green meadow with trees and distant hills under a soft pastel sky. A small bird wakes up on a tree branch, stretches, and flies away. The title “Big Buck BUNNY” appears over a shot of a large tree with a burrow at its base. A very large, fluffy, grey rabbit, Big Buck Bunny, emerges from the burrow, stretching and looking happy. He steps out into the sunny meadow, enjoying the flowers and a purple butterfly.

Observing the rabbit from a nearby tree are three smaller rodent-like creatures: two squirrels (one red and one brown and lighter-colored) and a grey chinchilla holding an acorn. They watch the bunny’s cheerful antics. The red squirrel throws an apple at the rabbit, hitting him on the head. The bunny is startled, then looks down at the apple, picks it up, and smiles. He looks up at the tree where the critters are hiding, but they disappear.

The smaller animals continue to throw objects at the bunny, who initially just looks annoyed. They throw a spiky seed pod that lands on his foot, causing him pain and frustration. The bunny’s demeanor changes dramatically from gentle and happy to grim determination. He finds a sturdy stick and sharpens it using rocks, creating a spear. He then uses a vine to create a bow.

Big Buck Bunny sets a trap by sharpening several pointed sticks and placing them in the ground, covering them with leaves. The squirrels and chinchilla watch from behind a rock. The bunny stands ready with his bow and spear. The red squirrel taunts him and glides over the trap, landing safely. The chinchilla, still holding his acorn, accidentally rolls it under a hollow log near the rock they were hiding behind. The log rolls forward onto the trap, triggering it and impaling the log on the sharpened sticks.

The three smaller animals look shocked by the triggered trap. The red squirrel glides over the area again, sees the sharpened sticks, and is startled. He crashes into a tree branch above the area. Big Buck Bunny appears, grabs the scared red squirrel, and looks at him with a stern, almost satisfied, expression. He holds the squirrel for a moment, then lets him go. The squirrel quickly runs back to the other two. Big Buck Bunny returns to his relaxed, happy state, and the purple butterfly lands on his nose again. He smiles contentedly as the credits roll, showing animated versions of the squirrel and chinchilla characters.

Process a YouTube link

For YouTube links, you don’t need to explicitly upload the video file content, but you do need to explicitly declare the video URL you want the model to process as part of the contents of the request. For more information see the vision documentation including the features and limits.

Note

You’re only able to submit up to one YouTube link per generateContent request.

Note

If your text input includes YouTube links, the system won’t process them, which may result in incorrect responses. To ensure proper handling, explicitly provide the URL using the file_uri parameter in FileData.

The following example shows how you can use the model to summarize the video. In this case use a summary video of Google I/O 2024.

const youtube_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    google.createPartFromText("Summarize this video."),
    google.createPartFromUri("https://www.youtube.com/watch?v=WsEQjeZoEng", "video/x-youtube"),
  ],
});

tslab.display.markdown(youtube_response.text ?? "");

This video summarizes key announcements from Google I/O, focusing on advancements in AI.

Here are the main points:

Gemini Era & Integration: Google is fully in its “Gemini era,” integrating the multimodal AI model across all its 2 billion user products, including Gmail, Android, Chrome, Play, and YouTube.
Gemini 1.5 Pro & Long Context: Gemini 1.5 Pro, already available in Workspace Labs, features a significantly expanded context window of up to 2 million tokens, allowing it to process massive amounts of information (like summarizing long emails or extracting information from large documents/videos).
Gemini 1.5 Flash: A new, lighter-weight model designed for speed and efficiency at scale, while still retaining multimodal reasoning and breakthrough long context capabilities.
Project Astra: Google is working on a universal AI agent called Project Astra, designed to be helpful in everyday life. Prototypes show the agent understanding real-time visual and audio input (like identifying code functions or finding lost glasses) and remembering context.
Generative Video (Veo): A new generative video model called Veo creates high-quality 1080p videos from text, image, and video prompts, capable of capturing detailed instructions and cinematic styles.
Infrastructure (Trillium): Google is introducing Trillium, their 6th generation TPUs, which deliver a 4.7x improvement in compute performance per chip compared to the previous generation, supporting these advanced AI models.
Generative AI in Search: Google Search is evolving with generative AI (“AI Overviews”), aiming to answer complex, multi-part questions. AI Overviews will be available to over 1 billion people by the end of the year, and future capabilities will include asking questions with video using Google Lens.
Personalized AI (Gems): A new feature called “Gems” allows users to customize Gemini for specific needs or topics, creating personal AI experts with tailored instructions.
AI in Android: Android is being reimagined with AI at its core, making Gemini more context-aware to provide helpful suggestions in the moment. Gemini Nano with multimodality will bring this capability to Pixel phones later this year.
Open Models (Gemma): Google continues developing its family of open models, Gemma, for AI innovation and responsibility. PaliGemma, their first vision-language open model, is available now, and Gemma 2 (including a 27B parameter model) is coming in June.
Responsible AI & Learning: Google emphasizes building AI responsibly, using practices like Red Teaming to identify weaknesses. They also introduce LearnLM, a family of models based on Gemini and fine-tuned for learning, which will be integrated into platforms like YouTube to make educational videos more interactive with features like quizzes and explanations.

Overall, the video highlights Google’s commitment to integrating powerful, multimodal, and context-aware AI, powered by their latest hardware, into their core products and platforms to make technology more helpful and intelligent for users, while also emphasizing responsible development.

Use url context

The URL Context tool empowers Gemini models to directly access, process, and understand content from user-provided web page URLs. This is key for enabling dynamic agentic workflows, allowing models to independently research, analyze articles, and synthesize information from the web as part of their reasoning process.

In this example you will use two links as reference and ask Gemini to find differences between the cook receipes present in each of the links

const url_context_prompt = `
Compare recipes from https://www.food.com/recipe/homemade-cream-of-broccoli-soup-271210
and from https://www.allrecipes.com/recipe/13313/best-cream-of-broccoli-soup/,
list the key differences between them.
`;
const url_context_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [url_context_prompt],
  config: {
    tools: [{ urlContext: {} }],
  },
});

tslab.display.markdown(url_context_response.text ?? "");

The two cream of broccoli soup recipes from Food.com and Allrecipes.com have several key differences in their ingredients and preparation methods:

Additional Vegetables: The Allrecipes.com recipe includes celery along with onion, whereas the Food.com recipe only uses onion.
Broccoli and Broth Ratio: The Allrecipes.com recipe calls for a higher quantity of broccoli (8 cups) relative to chicken broth (3 cups), suggesting a more broccoli-dense soup. In contrast, the Food.com recipe uses 4 cups of broccoli with a larger amount of chicken broth (6 cups).
Dairy Product: The Food.com recipe uses half-and-half for creaminess, while the Allrecipes.com recipe uses regular milk.
Soup Texture: A significant difference is the final texture. The Allrecipes.com recipe explicitly directs users to purée the soup until “totally smooth” using a blender or immersion blender. The Food.com recipe, however, does not mention blending, implying a chunkier soup with discernible “bite sized pieces” of broccoli.
Roux Preparation and Quantity: Both recipes use a butter-flour roux for thickening, but their methods and quantities differ. The Food.com recipe uses a larger amount of roux (6 tablespoons butter, 2/3 cup flour) which is prepared first and then whisked into the boiling broth. The Allrecipes.com recipe uses a smaller amount of roux (3 tablespoons butter, 3 tablespoons flour) and prepares it separately with milk (like a béchamel sauce) before adding it to the puréed soup.
Seasoning Specification: The Food.com recipe provides specific measurements for salt (1 teaspoon) and pepper (1/4 teaspoon). The Allrecipes.com recipe lists “ground black pepper to taste” and does not explicitly list salt in its ingredients, although user reviews indicate it’s typically added for flavor.

Use context caching

Context caching lets you to store frequently used input tokens in a dedicated cache and reference them for subsequent requests, eliminating the need to repeatedly pass the same set of tokens to a model.

Context caching is only available for stable models with fixed versions (for example, gemini-1.5-flash-002). You must include the version postfix (for example, the -002 in gemini-1.5-flash-002). You can find more caching examples here.

Create a cache

const system_instruction = `
You are an expert researcher who has years of experience in conducting systematic literature surveys and meta-analyses of different topics.
You pride yourself on incredible accuracy and attention to detail. You always stick to the facts in the sources provided, and never make up new facts.
Now look at the research paper below, and answer the following questions in 1-2 sentences.
`;

const urls = [
  "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2312.11805v3.pdf",
  "https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf",
];

await downloadFile(urls[0], path.join("../assets", "2312.11805v3.pdf"));
await downloadFile(urls[1], path.join("../assets", "2403.05530.pdf"));

const pdf_1 = await ai.files.upload({
  file: path.join("../assets", "2312.11805v3.pdf"),
  config: {
    displayName: "2312.11805v3.pdf",
  },
});
const pdf_2 = await ai.files.upload({
  file: path.join("../assets", "2403.05530.pdf"),
  config: {
    displayName: "2403.05530.pdf",
  },
});
const cached_content = await ai.caches.create({
  model: MODEL_ID,
  config: {
    displayName: "Research papers",
    systemInstruction: system_instruction,
    contents: [
      google.createPartFromUri(pdf_1.uri ?? "", pdf_1.mimeType ?? ""),
      google.createPartFromUri(pdf_2.uri ?? "", pdf_2.mimeType ?? ""),
    ],
    ttl: "3600s",
  },
});
console.log(JSON.stringify(cached_content, null, 2));

{
  "name": "cachedContents/ku5wqm1wv0yurelr12df9q762og11tkzit98oglv",
  "displayName": "Research papers",
  "model": "models/gemini-2.5-flash-preview-04-17",
  "createTime": "2025-05-12T17:05:57.425310Z",
  "updateTime": "2025-05-12T17:05:57.425310Z",
  "expireTime": "2025-05-12T18:05:55.247081588Z",
  "usageMetadata": {
    "totalTokenCount": 43164
  }
}

Listing available cache objects

const pager = await ai.caches.list({ config: { pageSize: 10 } });
let { page } = pager;

// eslint-disable-next-line @typescript-eslint/no-unnecessary-condition
while (true) {
  for (const c of page) {
    console.log(JSON.stringify(c, null, 2));
  }
  if (!pager.hasNextPage()) break;
  page = await pager.nextPage();
}

{
  "name": "cachedContents/ku5wqm1wv0yurelr12df9q762og11tkzit98oglv",
  "displayName": "Research papers",
  "model": "models/gemini-2.5-flash-preview-04-17",
  "createTime": "2025-05-12T17:05:57.425310Z",
  "updateTime": "2025-05-12T17:05:57.425310Z",
  "expireTime": "2025-05-12T18:05:55.247081588Z",
  "usageMetadata": {
    "totalTokenCount": 43164
  }
}
{
  "name": "cachedContents/6dsdqwnusjdaaqoyxsjny8k75z5nuqy5y4wt2n78",
  "displayName": "Research papers",
  "model": "models/gemini-2.5-flash-preview-04-17",
  "createTime": "2025-05-12T17:05:04.443214Z",
  "updateTime": "2025-05-12T17:05:04.443214Z",
  "expireTime": "2025-05-12T18:05:02.260735533Z",
  "usageMetadata": {
    "totalTokenCount": 43164
  }
}

Use a cache

const cached_response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: ["What is the research goal shared by these research papers?"],
  config: {
    cachedContent: cached_content.name ?? "",
  },
});
tslab.display.markdown(cached_response.text ?? "");

Based on the provided research papers, the shared research goal is to introduce and advance the Gemini family of highly capable multimodal models. These models are designed to have strong generalist capabilities across image, audio, video, and text understanding and reasoning.

Delete a cache

await ai.caches.delete({
  name: cached_content.name ?? "",
});

{}

Get text embeddings

You can get text embeddings for a snippet of text by using embedContent method and using the gemini-embedding-exp-03-07 model.

The Gemini Embeddings model produces an output with 3072 dimensions by default. However, you’ve the option to choose an output dimensionality between 1 and 3072. See the embeddings guide for more details.

const TEXT_EMBEDDING_MODEL_ID = "gemini-embedding-exp-03-07";

const embedding_response = await ai.models.embedContent({
  model: TEXT_EMBEDDING_MODEL_ID,
  contents: [
    "How do I get a driver's license/learner's permit?",
    "How do I renew my driver's license?",
    "How do I change my address on my driver's license?",
  ],
  config: {
    outputDimensionality: 512,
  },
});
console.log(embedding_response.embeddings);

[
  {
    values: [
      -0.0010864572,  0.0069392114,   0.017009795,  -0.010305981,  -0.009999484,
      -0.0064486223,  0.0041451487,  -0.005906698,   0.022229617,  -0.018305639,
       -0.018174557,   0.022160593,  -0.013604425, -0.0027964567,    0.12966625,
        0.028866312,  0.0014726851,    0.03537643,  -0.015166075,  -0.013479812,
       -0.019288255,   0.010106378, -0.0043296088,   0.018035924,    0.00295039,
       -0.007934979,  -0.005416007, -0.0095809875,   0.040398005, -0.0020784356,
        0.011551388,   0.009726445,   0.006670387,   0.020050988,   -0.00747873,
      -0.0012074928,  0.0047189263,  -0.006359583,   -0.01718203,  -0.023562348,
      -0.0051814457,   0.023801394,  -0.004928927,  -0.016113443,    0.01672777,
      -0.0069929743,  -0.012722719, -0.0137646515,  -0.041852377, -0.0011546672,
        0.017030545, -0.0022786013,   0.011707037,   -0.18675306,  -0.035211734,
       -0.011472648,    0.01970727,  0.0012368832,  -0.020796346,  -0.018513134,
       -0.006821043,   -0.01843726,   -0.00827558,  -0.042159837,  0.0038724025,
         0.01933339,  0.0139452815,   0.025059255,  0.0015087503,  -0.016094029,
      -0.0035785383,   0.023902593, -0.0050776727,  -0.016679537,   0.022865271,
        0.008837786,  0.0008471195,   -0.01220322, -0.0013522654,  -0.007976455,
       0.0006637936,   0.025458207,  -0.006010767,  0.0021908805,  -0.011703044,
       -0.018676927,  -0.008143593, -0.0141673125,  -0.010751537,   0.012337637,
      -0.0076921326,   0.019663645,    0.01961247,  -0.014446872,  -0.023902485,
       -0.020467523, -0.0043290784,  -0.003858363,   0.011151444,  -0.012050864,
      ... 412 more items
    ]
  },
  {
    values: [
       -0.007656846, -0.0054716235,  -0.0022609578,   -0.01828077,
       -0.024059096,  -0.009328189,    0.007841666,  -0.017600708,
       -0.020037796,  0.0007041083,   -0.021982383,  -0.014228797,
        0.006389422,  0.0033384573,     0.13877548, 0.00071368535,
         0.02660648,  -0.016807457,   -0.002774708,  -0.033598144,
        0.009136058,  -0.010518535,    -0.01765957,   0.008413775,
       -0.012133464,  0.0005497525,   -0.005911808,   0.010362617,
           0.029897,   0.023426512,    0.002516537,   0.013438467,
        0.014629691,  0.0071821967,  -0.0020077894,  -0.007421308,
      -0.0075392514,    0.01131475,    -0.02363941,  -0.008839639,
       -0.019605042,   0.012752105,    0.014192063,  -0.016767371,
        0.015282549,  -0.019914307,     0.00381812,   -0.01551508,
         -0.0521566,  -0.012766039,    0.008752456,  -0.007198684,
      -0.0066657816,   -0.16686901,   -0.018074488,  0.0043506487,
      -0.0001522175,   -0.02115512,   -0.010462675,   0.007636461,
          0.0301948,  -0.006009675,    -0.01135165,  -0.036605343,
         0.04006906,   0.036888044,  -0.0016293195,   0.013241053,
       0.0005548855,   0.008130081,    0.027193218,  0.0047560516,
        0.023012726,  -0.014274387,    0.008621267,  -0.016665483,
       -0.016523534,  -0.021947058,  -0.0077380626,  -0.008166752,
       -0.010050893, -0.0074697966,    0.021521091,  0.0086479345,
       -0.008508939,   -0.03031165,  -0.0068692113,   0.032342624,
       -0.003118368,  -0.009117541, -0.00006816292,   0.028233083,
       -0.008163683,  -0.029179588,   -0.034861602,  -0.009573525,
       -0.020023588,  -0.023040103,   0.0030518328,  -0.024019923,
      ... 412 more items
    ]
  },
  {
    values: [
        0.010123913,  -0.024184551,  0.0024574941,   -0.00984163, -0.0060574994,
       -0.007628851,   0.013202136,  -0.027927121, -0.0016973788,  -0.014774812,
       -0.011437808,  -0.019120526, -0.0063477424, -0.0050772373,    0.12938297,
        0.006073787, -0.0055986797,   0.030279782,   0.015260121, -0.0014168695,
       -0.006316713,  0.0007294639,  -0.034072377,   0.013348729,  0.0051308265,
      -0.0042954376,  -0.009459755,  -0.012910496,   0.010751937, -0.0017263377,
        -0.02083192,  0.0054532792,   0.008046588,  0.0015794274, -0.0045236745,
       0.0077354256,  -0.009697459,   0.006621996,    -0.0447099,  -0.019261474,
       0.0050193793,   0.010624901,   0.036847603,  -0.014380205,   0.023050537,
        0.019384636,    0.03039269,   -0.02306347,  -0.025763597,   0.017585728,
       0.0056267884,  -0.014494471,  -0.013168205,   -0.18764982,   0.011082365,
        0.007989808, -0.0069600893,  0.0019873218,  -0.020733004,  -0.011488622,
       0.0072846347,  -0.022266442,  -0.021857709,  -0.040680353,  0.0043984484,
        0.016409805,  0.0010387278,   0.028186318,  -0.020797107,   0.007164954,
       -0.007931046,   0.011955907,  0.0070153666,   -0.03028713,   0.039638296,
      -0.0005224554,  -0.008104055,  -0.021054681,   0.017767426,   -0.01705528,
      -0.0015202612,   0.027076574,  -0.008269598,  0.0041972124,  -0.009893149,
      -0.0059321057,   -0.02742561,   0.011967838, -0.0012843752,  -0.012446694,
        0.013188314,    0.01000231,  0.0063591595,  -0.013250329,   -0.00891349,
       -0.011323209, 0.00077099906,  -0.032252073,   0.017312435,  -0.010896756,
      ... 412 more items
    ]
  }
]

You’ll get a set of three embeddings, one for each piece of text you passed in:

console.log((embedding_response.embeddings ?? []).length);

You can also see the length of each embedding is 512, as per the output_dimensionality you specified.

const vector_1 = embedding_response.embeddings?.[0]?.values ?? [];
console.log(vector_1.length);

Next Steps

Useful API references:

Check out the Google GenAI SDK for more details on the new SDK. ### Related examples

For more detailed examples using Gemini models, check the Quickstarts folder of the cookbook. You’ll learn how to use the Live API, juggle with multiple tools or use Gemini 2.0 spatial understanding abilities.

Also check the Gemini thinking models that explicitly showcases its thoughts summaries and can manage more complex reasonings.