Gemini API: All about tokens

An understanding of tokens is central to using the Gemini API. This guide will provide a interactive introduction to what tokens are and how they are used in the Gemini API.

About tokens

LLMs break up their input and produce their output at a granularity that is smaller than a word, but larger than a single character or code-point.

These tokens can be single characters, like z, or whole words, like the. Long words may be broken up into several tokens. The set of all tokens used by the model is called the vocabulary, and the process of breaking down text into tokens is called tokenization.

For Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.

When billing is enabled, the price of a paid request is controlled by the number of input and output tokens, so knowing how to count your tokens is important.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── quickstarts
    └── Counting_Tokens.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Tokens in the Gemini API

Context windows

The models available through the Gemini API have context windows that are measured in tokens. These define how much input you can provide, and how much output the model can generate, and combined are referred to as the “context window”. This information is available directly through the API and in the models documentation.

In this example you can see the gemini-2.5-flash-preview-05-20 model has an 1M tokens context window. If you need more, Pro models have an even bigger 2M tokens context window.

const tslab = require("tslab") as typeof import("tslab");

const model_info_2_5_flash_preview_05_20 = await ai.models.get({
  model: "gemini-2.5-flash-preview-05-20",
});
console.log(JSON.stringify(model_info_2_5_flash_preview_05_20, null, 2));
{
  "name": "models/gemini-2.5-flash-preview-05-20",
  "displayName": "Gemini 2.5 Flash Preview 05-20",
  "description": "Preview release (April 17th, 2025) of Gemini 2.5 Flash",
  "version": "2.5-preview-05-20",
  "tunedModelInfo": {},
  "inputTokenLimit": 1048576,
  "outputTokenLimit": 65536,
  "supportedActions": [
    "generateContent",
    "countTokens",
    "createCachedContent",
    "batchGenerateContent"
  ]
}

Counting tokens

The API provides an endpoint for counting the number of tokens in a request: client.models.countTokens. You pass the same arguments as you would to client.models.generateContent and the service will return the number of tokens in that request.

Choose a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

The tokenization should be more or less the same for each of the Gemini models, but you can still switch between the different ones to double-check.

For more information about all Gemini models, check the documentation for extended information on each of them.

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Text tokens

const text_response = await ai.models.countTokens({
  model: MODEL_ID,
  contents: "What's the highest mountain in Africa?",
});
console.log(`Prompt Tokens: ${text_response.totalTokens}`);
Prompt Tokens: 10

When you call client.models.generateContent (or chat.sendMessage) the response object has a usageMetadata attribute containing both the input and output token counts (promptTokenCount and candidatesTokenCount):

const text_response_2 = await ai.models.generateContent({
  model: MODEL_ID,
  contents: "The quick brown fox jumps over the lazy dog.",
});
tslab.display.markdown(text_response_2.text ?? "");

Yes, that’s a classic and very well-known pangram!

It’s famous because it contains every letter of the English alphabet, which makes it useful for testing typewriters, keyboards, and font displays.

console.log(`Prompt Tokens: ${text_response_2.usageMetadata?.promptTokenCount}`);
console.log(`Output Tokens: ${text_response_2.usageMetadata?.candidatesTokenCount}`);
console.log(`Total Tokens: ${text_response_2.usageMetadata?.totalTokenCount}`);
Prompt Tokens: 11
Output Tokens: 47
Total Tokens: 881

Multi-modal tokens

All input to the API is tokenized, including images or other non-text modalities.

Images are considered to be a fixed size, so they consume a fixed number of tokens, regardless of their display or file size.

Video and audio files are converted to tokens at a fixed per second rate.

The current rates and token sizes can be found on the documentation.

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");

const IMG_URL = "https://goo.gle/instrument-img";

const downloadFile = async (url: string, filePath: string) => {
  const response = await fetch(url);
  if (!response.ok) {
    throw new Error(`Failed to download file: ${response.statusText}`);
  }
  const buffer = await response.blob();
  const bufferData = Buffer.from(await buffer.arrayBuffer());
  fs.writeFileSync(filePath, bufferData);
};

const filePath = path.join("../assets", "organ.jpg");
await downloadFile(IMG_URL, filePath);
tslab.display.jpeg(fs.readFileSync(filePath));

Inline content

Media objects can be sent to the API inline with the request:

const inline_response = await ai.models.countTokens({
  model: MODEL_ID,
  contents: [google.createPartFromBase64(fs.readFileSync(filePath).toString("base64"), "image/jpeg")],
});
console.log(`Image Tokens: ${inline_response.totalTokens}`);
Image Tokens: 259

You can try with different images and should always get the same number of tokens, that is independent of their display or file size. Note that an extra token seems to be added, representing the empty prompt.

Files API

The model sees identical tokens if you upload parts of the prompt through the files API instead:

const file_upload_response = await ai.files.upload({
  file: filePath,
  config: {
    displayName: "organ.jpg",
    mimeType: "image/jpeg",
  },
});
const file_response = await ai.models.countTokens({
  model: MODEL_ID,
  contents: [google.createPartFromUri(file_upload_response.uri ?? "", file_upload_response.mimeType ?? "")],
});
console.log(`File Tokens: ${file_response.totalTokens}`);
File Tokens: 259

Audio and video are each converted to tokens at a fixed rate of tokens per minute.

const AUDIO_URL =
  "https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3";
const audioPath = path.join("../assets", "audio.mp3");

await downloadFile(AUDIO_URL, audioPath);
import { ffprobe, FfprobeData } from "fluent-ffmpeg";

// eslint-disable-next-line @typescript-eslint/no-explicit-any
ffprobe(audioPath, (err: any, metadata: FfprobeData) => {
  if (err) {
    console.error("Error getting audio metadata:", err);
    return;
  }
  const { duration } = metadata.format;
  console.log(`Audio Duration: ${duration} seconds`);
});
Audio Duration: 2610.128938 seconds

As you can see, this audio file is 2610s long.

const audio_file_response = await ai.files.upload({
  file: audioPath,
  config: {
    displayName: "audio.mp3",
    mimeType: "audio/mpeg",
  },
});

const audio_response = await ai.models.countTokens({
  model: MODEL_ID,
  contents: [google.createPartFromUri(audio_file_response.uri ?? "", audio_file_response.mimeType ?? "")],
});

console.log(`Audio Tokens: ${audio_response.totalTokens}`);
console.log(`Tokens per second: ${audio_response.totalTokens / 2610}`);
Audio Tokens: 83528
Tokens per second: 32.003065134099614

Chat, tools and cache

Chat, tools and cache are currently not supported by the unified SDK countTokens method. This notebook will be updated when that will be the case.

In the meantime you can still check the token used after the call using the usageMetadata from the response. Check the caching notebook for more details.

Further reading

For more on token counting, check out the documentation or the API reference: