Gemini API: Question Answering using LangChain and Pinecone

Overview

Gemini is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input.

LangChain is a data framework designed to make integration of Large Language Models (LLM) like Gemini easier for applications.

Pinecone is a cloud-first vector database that allows users to search across billions of embeddings with ultra-low query latency.

In this notebook, you’ll learn how to create an application that answers questions using data from a website with the help of Gemini, LangChain, and Pinecone.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is two directories up from the notebook, hence we need to use ../../ to go up two directories. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── examples
    └── langchain
        └── Gemini_LangChain_QA_Pinecone_WebLoad.ipynb

Select a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Setup Pinecone

To use Pinecone in your application, you must have an API key. To create an API key you have to set up a Pinecone account. Visit Pinecone’s app page, and Sign up/Log in to your account. Then navigate to the “API Keys” section and copy your API key.

For more detailed instructions on getting the API key, you can read Pinecone’s Quickstart documentation.

Set the environment variable PINECONE_API_KEY to configure Pinecone to use your API key.

const PINECONE_API_KEY = process.env.PINECONE_API_KEY ?? "";
if (!PINECONE_API_KEY) {
  throw new Error("PINECONE_API_KEY is not set in the environment variables");
}
console.log("PINECONE_API_KEY is set in the environment variables");

PINECONE_API_KEY is set in the environment variables

Basic steps

LLMs are trained offline on a large corpus of public data. Hence they cannot answer questions based on custom or private data accurately without additional context.

If you want to make use of LLMs to answer questions based on private data, you have to provide the relevant documents as context alongside your prompt. This approach is called Retrieval Augmented Generation (RAG).

You will use this approach to create a question-answering assistant using the Gemini text model integrated through LangChain. The assistant is expected to answer questions about Gemini model. To make this possible you will add more context to the assistant using data from a website.

In this tutorial, you’ll implement the two main components in an RAG-based architecture:

Retriever

Based on the user’s query, the retriever retrieves relevant snippets that add context from the document. In this tutorial, the document is the website data. The relevant snippets are passed as context to the next stage - “Generator”.
Generator

The relevant snippets from the website data are passed to the LLM along with the user’s query to generate accurate answers.

You’ll learn more about these stages in the upcoming sections while implementing the application.

Retriever

In this stage, you will perform the following steps:

Read and parse the website data using LangChain.
Create embeddings of the website data.

Embeddings are numerical representations (vectors) of text. Hence, text with similar meaning will have similar embedding vectors. You’ll make use of Gemini’s embedding model to create the embedding vectors of the website data.

Store the embeddings in Pinecone’s vector store.

Pinecone is a vector database. The Pinecone vector store helps in the efficient retrieval of similar vectors. Thus, for adding context to the prompt for the LLM, relevant embeddings of the text matching the user’s question can be retrieved easily using Pinecone.

Create a Retriever from the Pinecone vector store.

The retriever will be used to pass relevant website embeddings to the LLM along with user queries.

Read and parse the website data

LangChain provides a wide variety of document loaders. To read the website data as a document, you will use the WebBaseLoader from LangChain.

To know more about how to read and parse input data from different sources using the document loaders of LangChain, read LangChain’s document loaders guide.

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/");
const docs = await loader.load();

If you only want to select a specific portion of the website data to add context to the prompt, you can use regex, text slicing, or text splitting.

In this example, you’ll use the split() function to extract the required portion of the text. The extracted text should be converted back to LangChain’s Document format.

import { Document } from "@langchain/core/documents";

const text_content = docs[0].pageContent;
const text_content_1 = text_content.split("code, audio, image and video.")[1];
const final_text = text_content_1.split("Cloud TPU v5p")[0];
const processed_docs = [
  new Document({
    pageContent: final_text,
    metadata: {
      source: "local",
    },
  }),
];

Initialize Gemini’s embedding model

To create the embeddings from the website data, you’ll use Gemini’s embedding model, gemini-embedding-004 which supports creating text embeddings.

To use this embedding model, you have to import GoogleGenerativeAIEmbeddings from LangChain. To know more about the embedding model, read Google AI’s language documentation.

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const gemini_embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: GEMINI_API_KEY,
  model: "models/gemini-embedding-001",
});

Store the data using Pinecone

To create a Pinecone vector database, first, you have to initialize your Pinecone client connection using the API key you set previously.

In Pinecone, vector embeddings have to be stored in indexes. An index represents the vector data’s top-level organizational unit. The vectors in any index must have the same dimensionality and distance metric for calculating similarity. You can read more about indexes in Pinecone’s Indexes documentation.

First, you’ll create an index using Pinecone’s createIndex function. Pinecone allows you to create two types of indexes, Serverless indexes and Pod-based indexes. Pinecone’s free starter plan lets you create only one project and one pod-based starter index with sufficient resources to support 100,000 vectors. For this tutorial, you have to create a pod-based starter index. To know more about different indexes and how they can be created, read Pinecone’s create indexes guide.

Next, you’ll insert the documents you extracted earlier from the website data into the newly created index using LangChain’s Pinecone.fromDocuments. Under the hood, this function creates embeddings from the documents created by the document loader of LangChain using any specified embedding model and inserts them into the specified index in a Pinecone vector database.

You have to specify the processed_docs you created from the website data using LangChain’s WebBasedLoader and the gemini_embeddings as the embedding model when invoking the fromDocuments function to create the vector database from the website data.

import { Pinecone } from "@pinecone-database/pinecone";
import { PineconeStore } from "@langchain/pinecone";

const INDEX_NAME = "langchain-showcase";

const pinecone = new Pinecone({
  apiKey: PINECONE_API_KEY,
});

const existing_indexes = await pinecone.listIndexes();
if (!existing_indexes.indexes?.some((idx) => idx.name === INDEX_NAME)) {
  console.log("Creating index...");
  await pinecone.createIndex({
    name: INDEX_NAME,
    dimension: 3072, // gemini-embedding-001 dimension
    metric: "cosine",
    spec: {
      serverless: {
        cloud: "aws",
        region: "us-east-1",
      },
    },
  });

  console.log(`Index ${INDEX_NAME} created.`);
}

const pinecone_index = pinecone.Index(INDEX_NAME);

Creating index...
Index langchain-showcase created.

const vector_store = await PineconeStore.fromDocuments(processed_docs, gemini_embeddings, {
  pineconeIndex: pinecone_index,
});

Create a retriever using Pinecone

You’ll now create a retriever that can retrieve website data embeddings from the newly created Pinecone vector store. This retriever can be later used to pass embeddings that provide more context to the LLM for answering user’s queries.

Invoke the asRetriever function of the vector store you initialized in the last step, to create a retriever.

const retriever = vector_store.asRetriever();
console.log((await retriever.invoke("MMLU")).length);

Generator

The Generator prompts the LLM for an answer when the user asks a question. The retriever you created in the previous stage from the Pinecone vector store will be used to pass relevant embeddings from the website data to the LLM to provide more context to the user’s query.

You’ll perform the following steps in this stage:

Chain together the following:
- A prompt for extracting the relevant embeddings using the retriever.
- A prompt for answering any question using LangChain.
- An LLM model from Gemini for prompting.
Run the created chain with a question as input to prompt the model for an answer.

Initialize Gemini

You must import ChatGoogleGenerativeAI from LangChain to initialize your model. In this example, you will use gemini-2.5-flash, as it supports text summarization. To know more about the text model, read Google AI’s language documentation.

You can configure the model parameters such as temperature or topP, by passing the appropriate values when initializing the ChatGoogleGenerativeAI LLM. To learn more about the parameters and their uses, read Google AI’s concepts guide.

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const llm = new ChatGoogleGenerativeAI({
  model: MODEL_ID,
  apiKey: GEMINI_API_KEY,
});

Create prompt templates

You’ll use LangChain’s PromptTemplate to generate prompts to the LLM for answering questions.

In the llm_prompt, the variable question will be replaced later by the input question, and the variable context will be replaced by the relevant text from the website retrieved from the Chroma vector store.

import { PromptTemplate } from "@langchain/core/prompts";

const llm_prompt = PromptTemplate.fromTemplate(
  `
  You are an assistant for question-answering tasks.
  Use the following context to answer the question.
  If you don't know the answer, just say that you don't know.
  Use five sentences maximum and keep the answer concise.\n
  Question: {question} \n
  Context: {context} \n
  Answer:
  `
);
console.log(JSON.stringify(llm_prompt, null, 2));

{
  "lc": 1,
  "type": "constructor",
  "id": [
    "langchain_core",
    "prompts",
    "prompt",
    "PromptTemplate"
  ],
  "kwargs": {
    "input_variables": [
      "question",
      "context"
    ],
    "template_format": "f-string",
    "template": "\n  You are an assistant for question-answering tasks.\n  Use the following context to answer the question.\n  If you don't know the answer, just say that you don't know.\n  Use five sentences maximum and keep the answer concise.\n\n  Question: {question} \n\n  Context: {context} \n\n  Answer:\n  "
  }
}

Create a stuff documents chain

LangChain provides Chains for chaining together LLMs with each other or other components for complex applications. You will create a stuff documents chain for this application. A stuff documents chain lets you combine all the relevant documents, insert them into the prompt, and pass that prompt to the LLM.

You can create a stuff documents chain using the LangChain Expression Language (LCEL).

To learn more about different types of document chains, read LangChain’s chains guide.

The stuff documents chain for this application retrieves the relevant website data and passes it as the context to an LLM prompt along with the input question.

import { Document } from "@langchain/core/documents";
import { RunnablePassthrough, RunnableMap, RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

const formatDocs = (docs: Document[]) => docs.map((doc) => doc.pageContent).join("\n\n");

const rag_chain = RunnableSequence.from([
  // Step 1: Map inputs into { context, question }
  new RunnableMap({
    steps: {
      context: retriever.pipe(formatDocs),
      question: new RunnablePassthrough(),
    },
  }),
  // Step 2: Apply prompt
  llm_prompt,
  // Step 3: Call LLM
  llm,
  // Step 4: Parse output to string
  new StringOutputParser(),
]);

Prompt the model

You can now query the LLM by passing any question to the invoke() function of the stuff documents chain you created previously.

tslab.display.markdown(await rag_chain.invoke("What is Gemini?"));

Gemini is Google’s largest, most capable, and flexible AI model, designed to run efficiently on various devices from data centers to mobile phones. It is natively multimodal, trained to seamlessly understand and reason about different inputs like text, images, and audio. Its first version, Gemini 1.0, is optimized into three sizes: Ultra, Pro, and Nano, catering to different task complexities. Gemini excels at sophisticated reasoning, advanced coding, and solving complex problems.

Summary

Gemini API works great with Langchain. The integration is seamless and provides an easy interface for:

loading and splitting files
creating Pinecone database with embeddings
answering questions based on context from files

What’s next?

This notebook showed only one possible use case for langchain with Gemini API. You can find many more here.