Gemini API: File API Quickstart

The Gemini API supports prompting with text, image, and audio data, also known as multimodal prompting. You can include text, image, and audio in your prompts. For small images, you can point the Gemini model directly to a local file when providing a prompt. For larger text files, images, videos, and audio, upload the files with the File API before including them in prompts.

The File API lets you store up to 20GB of files per project, with each file not exceeding 2GB in size. Files are stored for 48 hours and can be accessed with your API key for generation within that time period. It is available at no cost in all regions where the Gemini API is available.

For information on valid file formats (MIME types) and supported models, see the documentation on supported file formats and view the text examples at the end of this guide.

This guide shows how to use the File API to upload a media file and include it in a generateContent call to the Gemini API. For more information, see the code samples.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Important

The File API uses API keys for authentication and access. Uploaded files are associated with the API key’s cloud project. Unlike other Gemini APIs that use API keys, your API key also grants access data you’ve uploaded to the File API, so take extra care in keeping your API key secure. For best practices on securing API keys, refer to Google’s documentation.

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── File_API.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Upload a file

The File API lets you upload a variety of multimodal MIME types, including images and audio formats. The File API handles inputs that can be used to generate content with model.generateContent or model.streamGenerateContent.

The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API.

First, you will prepare a sample image to upload to the API.

Note: You can also upload your own files to use.

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");

const IMG_URL = "https://storage.googleapis.com/generativeai-downloads/data/jetpack.png";

const downloadFile = async (url: string, filePath: string) => {
  const response = await fetch(url);
  if (!response.ok) {
    throw new Error(`Failed to download file: ${response.statusText}`);
  }
  const buffer = await response.blob();
  const bufferData = Buffer.from(await buffer.arrayBuffer());
  fs.writeFileSync(filePath, bufferData);
};

const filePath = path.join("../assets", "jetpack.png");
await downloadFile(IMG_URL, filePath);
const tslab = require("tslab") as typeof import("tslab");

tslab.display.png(fs.readFileSync("../assets/jetpack.png"));

Next, you will upload that file to the File API.

const file = await ai.files.upload({
  file: filePath,
  config: {
    displayName: "jetpack.png",
    mimeType: "image/png",
  },
});
console.log("File uploaded:", JSON.stringify(file, null, 2));
File uploaded: {
  "name": "files/h10fapw8h9ra",
  "displayName": "jetpack.png",
  "mimeType": "image/png",
  "sizeBytes": "1567837",
  "createTime": "2025-06-15T12:38:56.523729Z",
  "expirationTime": "2025-06-17T12:38:56.339515281Z",
  "updateTime": "2025-06-15T12:38:56.523729Z",
  "sha256Hash": "YWFjOTU0ZGRjYzg4ZTc0YmIxMGYwZDZjZmVjMWFhZGJjZDU0OTJmMzI5NTZmM2E0OGM3NDc0NzI1YTNkZDIzOQ==",
  "uri": "https://generativelanguage.googleapis.com/v1beta/files/h10fapw8h9ra",
  "state": "ACTIVE",
  "source": "UPLOADED"
}

The response shows that the File API stored the specified displayName for the uploaded file and a uri to reference the file in Gemini API calls. Use response to track how uploaded files are mapped to URIs.

Depending on your use cases, you could store the URIs in structures such as a dict or a database.

Get a file

After uploading the file, you can verify the API has successfully received the files by calling files.get.

It lets you get the file metadata that have been uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the name (and by extension, the uri) are unique. Only use the displayName to identify files if you manage uniqueness yourself.

const retreived_file = await ai.files.get({
  name: file.name!,
});
console.log("File retrieved:", JSON.stringify(retreived_file, null, 2));
File retrieved: {
  "name": "files/h10fapw8h9ra",
  "displayName": "jetpack.png",
  "mimeType": "image/png",
  "sizeBytes": "1567837",
  "createTime": "2025-06-15T12:38:56.523729Z",
  "expirationTime": "2025-06-17T12:38:56.339515281Z",
  "updateTime": "2025-06-15T12:38:56.523729Z",
  "sha256Hash": "YWFjOTU0ZGRjYzg4ZTc0YmIxMGYwZDZjZmVjMWFhZGJjZDU0OTJmMzI5NTZmM2E0OGM3NDc0NzI1YTNkZDIzOQ==",
  "uri": "https://generativelanguage.googleapis.com/v1beta/files/h10fapw8h9ra",
  "state": "ACTIVE",
  "source": "UPLOADED"
}

Generate content

After uploading the file, you can make generateContent requests that reference the file by providing the URI. In the JS SDK, you need to construct a Part object that includes the file URI and mime type and then pass it to the generateContent method.

Here you create a prompt that starts with text and includes the uploaded image.

const content_response = await ai.models.generateContent({
  model: "gemini-2.5-flash-preview-05-20",
  contents: [
    "Describe the image with a creative description.",
    google.createPartFromUri(retreived_file.uri!, retreived_file.mimeType!),
  ],
});
tslab.display.markdown(content_response.text ?? "");

On a classic lined notebook page, a visionary (or perhaps a very enthusiastic student) has penned the blueprint for the ultimate urban escape vehicle: the “JETPACK BACKPACK.” Rendered in crisp blue ink, the drawing features a sleek, almost minimalist backpack design, looking unassuming enough to carry textbooks or gym clothes.

But beneath its innocent exterior, magic awaits. From its base, two stubby, retractable boosters are poised to unleash their power, depicted by playful, swirling plumes of steam that hint at a “green/clean”, eco-friendly propulsion system. The designer promises it’s lightweight, fits an 18-inch laptop, and boasts padded strap support for comfortable ascent.

Yet, a mischievous detail catches the eye: while it proudly features “USB-C CHARGING” for modern convenience, the “15-MIN BATTERY LIFE” suggests that adventures in the sky will be thrillingly brief dashes, perhaps just enough to leapfrog over a traffic jam or make a truly grand entrance (or exit) before gravity reclaims its due. It’s a delightful doodle of daring, a dream of everyday flight captured on paper, proving that sometimes, the most imaginative journeys begin with a pen and a blank page.

Delete files

Files are automatically deleted after 2 days or you can manually delete them using files.delete().

await ai.files.delete({
  name: retreived_file.name!,
});
console.log("File deleted:", retreived_file.name);
File deleted: files/h10fapw8h9ra

Supported text types

As well as supporting media uploads, the File API can be used to embed text files, such as Python code, or Markdown files, into your prompts.

This example shows you how to load a markdown file into a prompt using the File API.

const MD_URL = "https://raw.githubusercontent.com/google-gemini/cookbook/main/CONTRIBUTING.md";

const mdFilePath = path.join("../assets", "contrib.md");
await downloadFile(MD_URL, mdFilePath);

const mdFile = await ai.files.upload({
  file: mdFilePath,
  config: {
    displayName: "contrib.md",
    mimeType: "text/markdown",
  },
});

const md_content_response = await ai.models.generateContent({
  model: "gemini-2.5-flash-preview-05-20",
  contents: [
    "What should I do before I start writing, when following these guidelines?",
    google.createPartFromUri(mdFile.uri!, mdFile.mimeType!),
  ],
});
tslab.display.markdown(md_content_response.text ?? "");

Before you start writing a contribution to the Gemini API Cookbook, you should do the following:

  1. Sign the Contributor License Agreement (CLA):
    • Visit https://cla.developers.google.com/ to see your current agreements or to sign a new one.
    • This gives Google permission to use and redistribute your contributions. (Check if you or your employer have already signed a Google CLA for another project, as you likely won’t need to do it again.)
  2. Review Style Guides:
    • Read the highlights of the technical writing style guide to anticipate common feedback.
    • Check the relevant style guide for the programming language you will be using (e.g., Python).
    • (For Python notebooks specifically): Consider running pyink over your notebook, as it helps with style consistency (though not strictly required before starting, it’s good to keep in mind).
  3. File an Issue:
    • Before writing a single line or sending a pull request, file an issue on the GitHub repository.
    • This step is crucial for discussing your idea, getting guidance, and ensuring your content has full support before you invest time in writing it.

Some common text formats are automatically detected, such as text/x-python, text/html and text/markdown. If you are using a file that you know is text, but is not automatically detected by the API as such, you can specify the MIME type as text/plain explicitly.

const CODE_URL = "https://raw.githubusercontent.com/google/gemma.cpp/main/../examples/hello_world/run.cc";

const codeFilePath = path.join("../assets", "gemma.cpp");
await downloadFile(CODE_URL, codeFilePath);

const codeFile = await ai.files.upload({
  file: codeFilePath,
  config: {
    displayName: "gemma.cpp",
    mimeType: "text/plain",
  },
});

const code_content_response = await ai.models.generateContent({
  model: "gemini-2.5-flash-preview-05-20",
  contents: ["What does this program do?", google.createPartFromUri(codeFile.uri!, codeFile.mimeType!)],
});
tslab.display.markdown(code_content_response.text ?? "");

This C++ program is a demonstration of how to run the Gemma large language model for text generation, specifically highlighting a “constrained decoding” feature.

Here’s a breakdown of what it does:

  1. Initialization and Argument Parsing:
    • It starts by parsing command-line arguments using gcpp::LoaderArgs, gcpp::InferenceArgs, and gcpp::AppArgs. These arguments likely configure how the Gemma model is loaded (e.g., model path, precision) and how inference is performed (e.g., batch size).
    • It supports a --help flag for displaying usage information.
  2. Constrained Decoding (Token Rejection):
    • This is a key feature demonstrated in this specific example. It iterates through command-line arguments to find --reject. Any subsequent numeric arguments after --reject are treated as token IDs to be excluded from generation. These “rejected” tokens are stored in a std::set<int>.
    • This allows the user to specify tokens that the model must not output during the generation process.
  3. Gemma Model Setup:
    • It initializes the gcpp::Gemma model, which is the core large language model. This involves setting up the model’s topology, pools (likely for thread management and parallel computation, given the hwy and thread_pool includes), and MatMulEnv (matrix multiplication environment).
    • A gcpp::KVCache is also created, which is essential for efficient sequence generation in transformer models.
  4. Prompt Definition and Tokenization:
    • The program defines a hardcoded prompt: "Write a greeting to the world."
    • This prompt is then converted into a sequence of integer tokens using the model’s Tokenizer().
  5. Text Generation and Output:
    • It sets up a std::mt19937 random number generator, seeded by std::random_device, which is used for sampling tokens during generation (e.g., when temperature is applied).
    • stream_token Callback: A lambda function stream_token is defined. This function is called every time the model generates a new token. It decodes the token back into readable text and prints it to std::cout, giving a real-time streaming output experience.
    • accept_token Callback (Constrained Decoding in action): A crucial part of runtime_config is the accept_token lambda. This function takes a candidate token and returns true if it’s allowed, false if it should be rejected. In this program, it checks if the candidate token is present in the reject_tokens set. If it is, false is returned, preventing the model from outputting that specific token.
    • Finally, it calls model.Generate() with the prepared runtime_config, the tokenized prompt, the KV cache, and other parameters. The model then generates text based on the prompt, adhering to the specified constraints (like max_generated_tokens, temperature, and the accept_token filter).

In essence, this program demonstrates a basic text generation pipeline with the Gemma model, with a specific focus on showing how to implement constrained decoding by preventing the model from generating certain specified tokens. It’s a command-line application that will print the generated text to standard output.

Example Usage (Hypothetical):

# Basic run, model will generate freely
./gemma_example --model=/path/to/gemma.ckpt

# Run and prevent the model from generating tokens 123, 456, and 789
./gemma_example --model=/path/to/gemma.ckpt --reject 123 456 789

Next Steps

Useful API references:

For more information about the File API, check its API reference.

Continue your discovery of the Gemini API

If you’re not already familiar with it, learn how tokens are counted. Then check how to use the File API to use Audio or Video files with the Gemini API.