An understanding of tokens is central to using the Gemini API. This guide will provide a interactive introduction to what tokens are and how they are used in the Gemini API.
About tokens
LLMs break up their input and produce their output at a granularity that is smaller than a word, but larger than a single character or code-point.
These tokens can be single characters, like z, or whole words, like the. Long words may be broken up into several tokens. The set of all tokens used by the model is called the vocabulary, and the process of breaking down text into tokens is called tokenization.
For Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.
When billing is enabled, the price of a paid request is controlled by the number of input and output tokens, so knowing how to count your tokens is important.
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.
const google =require("@google/genai") astypeofimport("@google/genai");const ai =new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });
Tokens in the Gemini API
Context windows
The models available through the Gemini API have context windows that are measured in tokens. These define how much input you can provide, and how much output the model can generate, and combined are referred to as the “context window”. This information is available directly through the API and in the models documentation.
In this example you can see the gemini-2.5-flash-preview-05-20 model has an 1M tokens context window. If you need more, Pro models have an even bigger 2M tokens context window.
The API provides an endpoint for counting the number of tokens in a request: client.models.countTokens. You pass the same arguments as you would to client.models.generateContent and the service will return the number of tokens in that request.
Choose a model
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).
The tokenization should be more or less the same for each of the Gemini models, but you can still switch between the different ones to double-check.
For more information about all Gemini models, check the documentation for extended information on each of them.
const MODEL_ID ="gemini-2.5-flash-preview-05-20";
Text tokens
const text_response =await ai.models.countTokens({ model: MODEL_ID, contents:"What's the highest mountain in Africa?",});console.log(`Prompt Tokens: ${text_response.totalTokens}`);
Prompt Tokens: 10
When you call client.models.generateContent (or chat.sendMessage) the response object has a usageMetadata attribute containing both the input and output token counts (promptTokenCount and candidatesTokenCount):
const text_response_2 =await ai.models.generateContent({ model: MODEL_ID, contents:"The quick brown fox jumps over the lazy dog.",});tslab.display.markdown(text_response_2.text??"");
Yes, that’s a classic and very well-known pangram!
It’s famous because it contains every letter of the English alphabet, which makes it useful for testing typewriters, keyboards, and font displays.
You can try with different images and should always get the same number of tokens, that is independent of their display or file size. Note that an extra token seems to be added, representing the empty prompt.
Files API
The model sees identical tokens if you upload parts of the prompt through the files API instead:
Audio Tokens: 83528
Tokens per second: 32.003065134099614
Chat, tools and cache
Chat, tools and cache are currently not supported by the unified SDK countTokens method. This notebook will be updated when that will be the case.
In the meantime you can still check the token used after the call using the usageMetadata from the response. Check the caching notebook for more details.
Further reading
For more on token counting, check out the documentation or the API reference: