Gemini API: Question Answering using LangChain and Chroma

Overview

Gemini is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input.

LangChain is a data framework designed to make integration of Large Language Models (LLM) like Gemini easier for applications.

Chroma is an open-source embedding database focused on simplicity and developer productivity. Chroma allows users to store embeddings and their metadata, embed documents and queries, and search the embeddings quickly.

In this notebook, you’ll learn how to create an application that answers questions using data from a website with the help of Gemini, LangChain, and Chroma.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is two directories up from the notebook, hence we need to use ../../ to go up two directories. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── examples
    └── langchain
        └── Gemini_LangChain_QA_Chroma_WebLoad.ipynb

Select a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Basic steps

LLMs are trained offline on a large corpus of public data. Hence they cannot answer questions based on custom or private data accurately without additional context.

If you want to make use of LLMs to answer questions based on private data, you have to provide the relevant documents as context alongside your prompt. This approach is called Retrieval Augmented Generation (RAG).

You will use this approach to create a question-answering assistant using the Gemini text model integrated through LangChain. The assistant is expected to answer questions about the Gemini model. To make this possible you will add more context to the assistant using data from a website.

In this tutorial, you’ll implement the two main components in an RAG-based architecture:

Retriever
Based on the user’s query, the retriever retrieves relevant snippets that add context from the document. In this tutorial, the document is the website data. The relevant snippets are passed as context to the next stage - "Generator".
Generator
The relevant snippets from the website data are passed to the LLM along with the user’s query to generate accurate answers.

You’ll learn more about these stages in the upcoming sections while implementing the application.

Retriever

In this stage, you will perform the following steps:

Read and parse the website data using LangChain.
Create embeddings of the website data.
Embeddings are numerical representations (vectors) of text. Hence, text with similar meaning will have similar embedding vectors. You’ll make use of Gemini’s embedding model to create the embedding vectors of the website data.
Store the embeddings in Chroma’s vector store.
Chroma is a vector database. The Chroma vector store helps in the efficient retrieval of similar vectors. Thus, for adding context to the prompt for the LLM, relevant embeddings of the text matching the user’s question can be retrieved easily using Chroma.
Create a Retriever from the Chroma vector store.
The retriever will be used to pass relevant website embeddings to the LLM along with user queries.

Read and parse the website data

LangChain provides a wide variety of document loaders. To read the website data as a document, you will use the WebBaseLoader from LangChain.

To know more about how to read and parse input data from different sources using the document loaders of LangChain, read LangChain’s document loaders guide.

import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";

const loader = new CheerioWebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/");
const docs = await loader.load();

If you only want to select a specific portion of the website data to add context to the prompt, you can use regex, text slicing, or text splitting.

In this example, you’ll use the split() function to extract the required portion of the text. The extracted text should be converted back to LangChain’s Document format.

import { Document } from "@langchain/core/documents";

const text_content = docs[0].pageContent;
const text_content_1 = text_content.split("code, audio, image and video.")[1];
const final_text = text_content_1.split("Cloud TPU v5p")[0];
const processed_docs = [
  new Document({
    pageContent: final_text,
    metadata: {
      source: "local",
    },
  }),
];

Initialize Gemini’s embedding model

To create the embeddings from the website data, you’ll use Gemini’s embedding model, gemini-embedding-004 which supports creating text embeddings.

To use this embedding model, you have to import GoogleGenerativeAIEmbeddings from LangChain. To know more about the embedding model, read Google AI’s language documentation.

import { GoogleGenerativeAIEmbeddings } from "@langchain/google-genai";

const gemini_embeddings = new GoogleGenerativeAIEmbeddings({
  apiKey: GEMINI_API_KEY,
  model: "models/gemini-embedding-001",
});

Store the data using Chroma

To create a Chroma vector database from the website data, you will use the fromDocuments function of Chroma. Under the hood, this function creates embeddings from the documents created by the document loader of LangChain using any specified embedding model and stores them in a Chroma vector database.

You have to specify the processed_docs you created from the website data using LangChain’s WebBasedLoader and the gemini_embeddings as the embedding model when invoking the fromDocuments function to create the vector database from the website data.

const CHROMA_HOST = process.env.CHROMA_HOST ?? "localhost";
const CHROMA_PORT = parseInt(process.env.CHROMA_PORT ?? "8000");

Note

To set up Chroma, you can either use the Chroma Cloud or run a local instance of Chroma. If you want to run a local instance, you can use the Chroma Docker image.

For this particular example, you can run a local instance of Chroma using the following command:

$ pip install chromadb
$ chroma run --host localhost --port 8000

import { Chroma } from "@langchain/community/vectorstores/chroma";

const vectorStore = await Chroma.fromDocuments(processed_docs, gemini_embeddings, {
  clientParams: {
    host: CHROMA_HOST,
    port: CHROMA_PORT,
  },
  collectionName: "gemini-docs-db",
});

Create a retriever using Chroma

You’ll now create a retriever that can retrieve website data embeddings from the newly created Chroma vector store. This retriever can be later used to pass embeddings that provide more context to the LLM for answering user’s queries.

You can then invoke the asRetriever function of Chroma on the vector store to create a retriever.

const retriever = vectorStore.asRetriever({
  k: 1, // Return top 1 most similar document
  searchType: "similarity",
});

const wrappedRetriever = {
  invoke: async (query: string) => {
    const vec = await gemini_embeddings.embedQuery(query);
    // @ts-expect-error, Wrap it in an array to match Chroma's number[][] requirement
    return vectorStore.similaritySearchVectorWithScore([vec], 1);
  },
};

// Now this works
const testDocs = await wrappedRetriever.invoke("MMLU");
console.log("Retriever test - found documents:", testDocs.length);

Retriever test - found documents: 1

Generator

The Generator prompts the LLM for an answer when the user asks a question. The retriever you created in the previous stage from the Chroma vector store will be used to pass relevant embeddings from the website data to the LLM to provide more context to the user’s query.

You’ll perform the following steps in this stage:

Chain together the following:
- A prompt for extracting the relevant embeddings using the retriever.
- A prompt for answering any question using LangChain.
- An LLM model from Gemini for prompting.
Run the created chain with a question as input to prompt the model for an answer.

Initialize Gemini

You must import ChatGoogleGenerativeAI from LangChain to initialize your model. In this example, you will use gemini-2.5-flash, as it supports text summarization. To know more about the text model, read Google AI’s language documentation.

You can configure the model parameters such as temperature or topP, by passing the appropriate values when initializing the ChatGoogleGenerativeAI LLM. To learn more about the parameters and their uses, read Google AI’s concepts guide.

import { ChatGoogleGenerativeAI } from "@langchain/google-genai";

const llm = new ChatGoogleGenerativeAI({
  model: MODEL_ID,
  apiKey: GEMINI_API_KEY,
});

Create prompt templates

You’ll use LangChain’s PromptTemplate to generate prompts to the LLM for answering questions.

In the llm_prompt, the variable question will be replaced later by the input question, and the variable context will be replaced by the relevant text from the website retrieved from the Chroma vector store.

import { PromptTemplate } from "@langchain/core/prompts";

const llm_prompt = PromptTemplate.fromTemplate(
  `
  You are an assistant for question-answering tasks.
  Use the following context to answer the question.
  If you don't know the answer, just say that you don't know.
  Use five sentences maximum and keep the answer concise.\n
  Question: {question} \n
  Context: {context} \n
  Answer:
  `
);
console.log(JSON.stringify(llm_prompt, null, 2));

{
  "lc": 1,
  "type": "constructor",
  "id": [
    "langchain_core",
    "prompts",
    "prompt",
    "PromptTemplate"
  ],
  "kwargs": {
    "input_variables": [
      "question",
      "context"
    ],
    "template_format": "f-string",
    "template": "\n  You are an assistant for question-answering tasks.\n  Use the following context to answer the question.\n  If you don't know the answer, just say that you don't know.\n  Use five sentences maximum and keep the answer concise.\n\n  Question: {question} \n\n  Context: {context} \n\n  Answer:\n  "
  }
}

Create a stuff documents chain

LangChain provides Chains for chaining together LLMs with each other or other components for complex applications. You will create a stuff documents chain for this application. A stuff documents chain lets you combine all the relevant documents, insert them into the prompt, and pass that prompt to the LLM.

You can create a stuff documents chain using the LangChain Expression Language (LCEL).

To learn more about different types of document chains, read LangChain’s chains guide.

The stuff documents chain for this application retrieves the relevant website data and passes it as the context to an LLM prompt along with the input question.

import { Document } from "@langchain/core/documents";
import { Runnable, RunnablePassthrough, RunnableMap, RunnableSequence } from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";

const formatDocs = (docs: Document[]) => docs.map((doc) => doc.pageContent).join("\n\n");

// @ts-expect-error, missing properties in Runnable interface
const contextRunnable: Runnable = {
  invoke: async (query: string) => {
    const results = await wrappedRetriever.invoke(query);
    const docs = results.map((result) => result[0]);
    return formatDocs(docs);
  },
};

const rag_chain = RunnableSequence.from([
  // Step 1: Map inputs into { context, question }
  new RunnableMap({
    steps: {
      context: contextRunnable,
      question: new RunnablePassthrough(),
    },
  }),
  // Step 2: Apply prompt
  llm_prompt,
  // Step 3: Call LLM
  llm,
  // Step 4: Parse output to string
  new StringOutputParser(),
]);

Prompt the model

You can now query the LLM by passing any question to the invoke() function of the stuff documents chain you created previously.

tslab.display.markdown(await rag_chain.invoke("What is Gemini?"));

Gemini is Google’s largest and most capable AI model, designed to be natively multimodal and understand various inputs like text, images, and audio simultaneously. It is highly flexible, able to run efficiently on everything from data centers to mobile devices. Gemini 1.0, its first version, is optimized in three sizes: Ultra for complex tasks, Pro for a wide range of tasks, and Nano for on-device applications. Its state-of-the-art capabilities significantly enhance how developers and enterprises build and scale with AI. Gemini excels in sophisticated reasoning, coding, and understanding nuanced information across diverse subjects.

Conclusion

That’s it. You have successfully created an LLM application that answers questions using data from a website with the help of Gemini, LangChain, and Chroma.