Gemini API: Getting started with information grounding for Gemini models
In this notebook you will learn how to use information grounding with Gemini models.
Information grounding is the process of connecting these models to specific, verifiable information sources to enhance the accuracy, relevance, and factual correctness of their responses. While LLMs are trained on vast amounts of data, this knowledge can be general, outdated, or lack specific context for particular tasks or domains. Grounding helps to bridge this gap by providing the LLM with access to curated, up-to-date information.
Here you will experiment with:
Grounding information using Google Search grounding
Adding YouTube links to gather context information to your prompt
Using URL context to include website URL as context to your prompt
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
│
├── .env
└── quickstarts
└── Grounding.ipynb
Initialize SDK Client
With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.
const google =require("@google/genai") astypeofimport("@google/genai");const ai =new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });
Select a model
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).
Google Search grounding is particularly useful for queries that require current information or external knowledge. Using Google Search, Gemini can access nearly real-time information and better responses.
const search_grounding =await ai.models.generateContent({ model: MODEL_ID, contents:"What was the latest Indian Premier League match and who won?", config: { tools: [{ googleSearch: {} }], },});tslab.display.markdown(search_grounding.text??"");console.log("Search Query:",JSON.stringify(search_grounding.candidates?.[0]?.groundingMetadata?.webSearchQueries,null,2));console.log("Search Pages:",JSON.stringify(search_grounding.candidates?.[0]?.groundingMetadata?.groundingChunks,null,2));tslab.display.html(search_grounding.candidates?.[0]?.groundingMetadata?.searchEntryPoint?.renderedContent??"");
The latest Indian Premier League (IPL) match was the IPL 2025 Final, played on June 3, 2025, in Ahmedabad.
Royal Challengers Bengaluru (RCB) won the match against Punjab Kings (PBKS) by 6 runs, securing their maiden IPL title. Virat Kohli was emotional after RCB’s historic win. Krunal Pandya was named the Man of the Match for his economical bowling performance.
You can see that running the same prompt without search grounding gives you outdated information:
const without_search_grounding =await ai.models.generateContent({ model: MODEL_ID, contents:"What was the latest Indian Premier League match and who won?",});tslab.display.markdown(without_search_grounding.text??"");
The latest Indian Premier League (IPL) match was the Final of the 2024 season, played on May 26, 2024.
It was between:
Kolkata Knight Riders (KKR)
Sunrisers Hyderabad (SRH)
Kolkata Knight Riders (KKR) won the match by 8 wickets, securing their third IPL title.
Grounding with YouTube links
you can directly include a public YouTube URL in your prompt. The Gemini models will then process the video content to perform tasks like summarization and answering questions about the content.
This capability leverages Gemini’s multimodal understanding, allowing it to analyze and interpret video data alongside any text prompts provided.
You do need to explicitly declare the video URL you want the model to process as part of the contents of the request. Here a simple interaction where you ask the model to summarize a YouTube video:
This video introduces “Gemma Chess,” showcasing how Google’s Gemma AI model can bring a “new dimension” to the game of chess. Ju-yeong Ji from Google DeepMind explains that Gemma isn’t designed to replace powerful, calculative chess engines like Stockfish (which excel at finding optimal moves) but rather to enhance the chess experience through its language understanding and generation capabilities.
Gemma offers several key applications:
Game Analysis & Explanations: It can analyze chess games (using PGN data) and explain the strategic and tactical significance of moves in natural language, providing insights into why certain moves are interesting or impactful, even considering psychological aspects for human players.
Storytelling: Gemma can transform game data into engaging narratives, bringing historical matches or personal games to life with descriptive language and emotional context.
Chess Learning Support: It acts as a “smart study buddy,” capable of explaining complex chess concepts (like openings such as the “Sicilian Defense” or specific tactical ideas like a “passed pawn”) in detail, adapting the explanation to the user’s skill level, and supporting multiple languages. It can also offer feedback on a player’s understanding.
Essentially, Gemma combines the precise computational strength of traditional chess AI with its own ability to interpret and communicate complex information in a human-like way, making chess learning and analysis more intuitive and accessible for everyone.
But you can also use the link as the source of truth for your request. In this example, you will first ask how Gemma models can help on chess games:
const gemma_response =await ai.models.generateContent({ model: MODEL_ID, contents:"How Gemma models can help on chess games?",});tslab.display.markdown(gemma_response.text??"");
Gemma models, as Large Language Models (LLMs) developed by Google, are primarily designed for natural language understanding and generation. This means they operate on text, not directly on the board state, moves, or strategic calculations like a traditional chess engine (e.g., Stockfish, AlphaZero).
Therefore, Gemma models cannot play chess, calculate moves, or evaluate positions with the accuracy and depth of dedicated chess engines.
However, they can be incredibly helpful in chess games and study in indirect, linguistic, and informational ways:
Learning and Education:
Explaining Rules and Concepts: Ask Gemma to explain what a “fork,” “pin,” “discovered attack,” or “zugzwang” is. It can provide clear, concise definitions and examples.
Teaching Openings: It can describe common opening principles, explain the ideas behind specific openings (e.g., “What are the main ideas in the Ruy Lopez?”), and list common variations.
Analyzing Puzzles and Positions (Textual): You can describe a position (e.g., “White to move, King on g1, Queen on d1, Rook on a1… can White win?”) and ask for a general strategic idea or what a good move might be, based on common chess principles it learned from its training data. It won’t calculate precisely but can offer high-level advice.
Creating Study Plans: You could ask for a beginner’s study plan, topics to focus on, or recommendations for improving specific aspects of your game.
Game Analysis and Commentary (Prose):
Explaining Game Phases: Ask it to describe what typically happens in the opening, middlegame, and endgame.
Generating Commentary: Provide a sequence of moves (in algebraic notation) and ask Gemma to generate natural language commentary, explaining what’s happening or the likely intent behind moves.
Summarizing Games: Give it a PGN (Portable Game Notation) or a list of moves, and it can try to summarize the key moments, strategic themes, or turning points.
Translating Chess Notation: Convert algebraic notation into natural language descriptions, e.g., “e4 e5 Nf3 Nc6” -> “White moves their king’s pawn two squares, Black responds similarly, then White develops their knight to f3, and Black develops their knight to c6.”
Content Creation:
Writing Articles and Blogs: Generate outlines or draft content for articles about chess history, famous players, specific openings, or strategic concepts.
Creating Quizzes: Ask it to generate multiple-choice questions about chess rules, history, or basic tactics.
Scripting Videos: Help draft scripts for chess lessons or game analysis videos.
Historical and Conceptual Knowledge:
Recalling Famous Games/Players: Ask about legendary matches, famous blunders, or the achievements of grandmasters.
Understanding Chess Terminology: Clarify the meaning of obscure or advanced chess terms.
Key Limitations to Remember:
No Positional “Understanding”: Gemma models don’t “see” the board or calculate moves like a chess engine. Their understanding is based on patterns and relationships in the text they were trained on.
No Tactical Depth: They cannot calculate deep tactical lines, predict opponent responses accurately, or find the “best” move in a complex position.
Potential for Hallucination: Like any LLM, Gemma can sometimes generate plausible-sounding but incorrect information, especially when asked for precise strategic or tactical advice that requires deep calculation.
Relies on Training Data: Its knowledge is limited to what it learned from its vast text dataset. If a niche chess concept or a very recent game isn’t in its training data, it won’t know about it.
In summary, Gemma models are fantastic linguistic assistants for chess. They can help you learn, explain, and create content about chess, but they are not a substitute for a dedicated chess engine when it comes to playing, calculating, or deep positional analysis.
And then you can ask the same question, now having the YouTube video as context to be used by the model:
const gemma_grounding =await ai.models.generateContent({ model: MODEL_ID, contents: ["How Gemma models can help on chess games?", google.createPartFromUri(YOUTUBE_URL,"video/x-youtube")],});tslab.display.markdown(gemma_grounding.text??"");
Gemma models can help on chess games by bringing a “new dimension” to the experience, focusing on human understanding and interaction rather than solely on raw computational power. Here’s how:
Easier Chess Analysis and Explanations:
Demystifying Engine Output: Traditional chess engines often provide technical numbers and complex move sequences that are hard for humans to understand. Gemma can take this technical data and translate it into plain, understandable text.
Explaining Moves: It can explain why a particular move is good, outlining the strategic ideas, tactical advantages, and potential dangers associated with it.
Summarizing Complexities: For intricate parts of a game, Gemma can summarize key tactical and strategic moments, helping players quickly grasp important takeaways.
Storytelling and Narrative Generation:
Bringing Games to Life: Gemma can analyze a chess game (including context like players and tournaments) and generate a compelling narrative about how the game unfolded. This makes reviewing past games a more engaging and immersive experience than just looking at move notation.
Adding Context and Emotion: It can imbue the game analysis with a “backstory” or emotional context, making the “aha!” moments of a game more impactful.
Personalized Chess Learning and Coaching:
Intelligent Study Buddy: Gemma can act as a personalized chess coach, explaining concepts like openings (e.g., the Sicilian Defense) in an easy-to-understand manner.
Tailored Explanations: It can adapt its explanations to the user’s skill level (beginner, intermediate, advanced) and even provide them in different languages (e.g., Korean).
Targeted Feedback: Gemma can provide feedback on a player’s understanding of chess ideas and point out areas where they might need to improve, making the learning process more efficient and personalized.
Enhanced Interaction with Chess Engines:
Bridging AI and Human Understanding: By combining the brute-force calculation strength of traditional chess AI (like AlphaZero) with Gemma’s linguistic capabilities, it offers a more intuitive approach to learning and analyzing chess. It can interpret the engine’s optimal moves and explain the underlying logic in a human-friendly way.
Now your answer is more insightful for the topic you want, using the knowledge shared on the video and not necessarily available on the model knowledge.
Grounding information using URL context
The URL Context tool empowers Gemini models to directly access and process content from specific web page URLs you provide within your API requests. This is incredibly interesting because it allows your applications to dynamically interact with live web information without needing you to manually pre-process and feed that content to the model.
URL Context is effective because it allows the models to base its responses and analysis directly on the content of the designated web pages. Instead of relying solely on its general training data or broad web searches (which are also valuable grounding tools), URL Context anchors the model’s understanding to the specific information present at those URLs.
const url_context_response =await ai.models.generateContent({ model: MODEL_ID, contents: [` based on https://ai.google.dev/gemini-api/docs/models, what are the key differences between Gemini 1.5, Gemini 2.0 and Gemini 2.5 models? Create a markdown table comparing the differences. `, ], config: { tools: [{ urlContext: {} }], },});tslab.display.markdown(url_context_response.text??"");
The Gemini API offers various models optimized for different use cases, with Gemini 1.5, Gemini 2.0, and Gemini 2.5 representing different generations and capabilities. The key differences between the main variants are summarized in the table below.
Feature
Gemini 1.5 Pro
Gemini 1.5 Flash
Gemini 2.0 Flash
Gemini 2.5 Pro (Preview)
Gemini 2.5 Flash (Preview)
Primary Use Case / Optimization
Mid-size multimodal model optimized for a wide range of reasoning tasks; excels at processing large amounts of data.
Fast and versatile multimodal model for scaling across diverse tasks.
Next-generation features and improved capabilities, superior speed, native tool use, built for agentic experiences.
Most powerful thinking model with maximum response accuracy and state-of-the-art performance; best for complex coding, reasoning, and multimodal understanding.
Best model in terms of price-performance, offering well-rounded capabilities; ideal for low-latency, high-volume tasks requiring thinking.
Input Modalities
Audio, images, video, text
Audio, images, video, text
Audio, images, video, text
Audio, images, video, text
Audio, images, video, text
Output Modalities
Text
Text
Text
Text
Text
Input Token Limit
2,097,152 (2M)
1,048,576 (1M)
1,048,576 (1M)
1,048,576 (1M)
1,048,576 (1M)
Key Capabilities
System instructions, JSON mode, JSON schema, adjustable safety settings, caching, function calling, code execution.
System instructions, JSON mode, JSON schema, adjustable safety settings, caching, tuning, function calling, code execution.
Structured outputs, caching, function calling, code execution, search grounding, Live API. Thinking is experimental.
Structured outputs, caching, function calling, code execution, search grounding, thinking.
Latest Stable (also Experimental and Stable versions)
Preview
Preview
As a reference, you can see how the answer would be without the URL context, using the official models documentation as reference:
const without_url_context_response =await ai.models.generateContent({ model: MODEL_ID, contents: [` what are the key differences between Gemini 1.5, Gemini 2.0 and Gemini 2.5 models? Create a markdown table comparing the differences. `, ],});tslab.display.markdown(without_url_context_response.text??"");
It seems there might be a slight misunderstanding regarding the versioning of Gemini models. As of my last update, Google has not publicly released models named “Gemini 2.0” or “Gemini 2.5.”
The publicly announced and available Gemini models follow this progression:
Gemini 1.0 (Pro, Ultra, Nano): The initial general release of the Gemini family.
Gemini 1.5 Pro: A significant upgrade focusing on a massive context window and Mixture-of-Experts (MoE) architecture.
Gemini 1.5 Flash: A faster, more cost-effective version of 1.5 Pro, optimized for high-volume, lower-latency tasks.
Therefore, I will provide a comparison between Gemini 1.0 (representing the initial family), Gemini 1.5 Pro, and Gemini 1.5 Flash, as these are the relevant and distinct models in the Gemini lineup today.
Here’s a breakdown of their key differences:
Comparison of Gemini Models (1.0 vs. 1.5 Pro vs. 1.5 Flash)
Feature
Gemini 1.0 (Pro/Ultra/Nano)
Gemini 1.5 Pro
Gemini 1.5 Flash
Release/Announced
December 2023 (Pro/Ultra); Early 2024 (Nano)
February 2024 (Limited preview); May 2024 (Broader availability)
May 2024 (Announced alongside 1.5 Pro’s broader release)
High-volume API calls, real-time chatbots, dynamic content updates, RAG without deep reasoning, quick summarization
Key Differentiator
First publicly available, versatile Gemini family
Unprecedented long-context processing for multimodal data
Blazing speed and cost-efficiency for large-scale applications
In summary:
Gemini 1.0 established the baseline with strong general-purpose multimodal capabilities.
Gemini 1.5 Pro represents a monumental leap in the context window, allowing it to process and understand vast amounts of information (like entire novels, lengthy codebases, or hours of video) in a single prompt. Its MoE architecture contributes to this capability.
Gemini 1.5 Flash takes the MoE architecture from 1.5 Pro and optimizes it for speed and cost-efficiency, making it ideal for applications requiring high throughput where the deepest reasoning of 1.5 Pro isn’t strictly necessary. It retains the same large context window as 1.5 Pro.
It’s possible that “Gemini 2.0” or “Gemini 2.5” refers to future internal development versions that have not yet been announced publicly. Google frequently iterates and develops models, and future major versions will undoubtedly bring even more advanced capabilities.
As you can see, using the model knowledge only, it does not know about the new Gemini 2.5 models family.