Gemini API: Code analysis using LangChain

This notebook shows how to use Gemini API with Langchain for code analysis. The notebook will teach you:

loading and splitting files
creating an In-memory database with embedding information
setting up a retrieval QA chain

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is two directories up from the notebook, hence we need to use ../../ to go up two directories. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── examples
    └── langchain
        └── Code_analysis_using_Gemini_LangChain.ipynb

Select a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Prepare the files

First, download a langchain-google repository. It is the repository you will analyze in this example.

It contains code integrating Gemini API, VertexAI, and other Google products with langchain.

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");
const child_process = require("child_process") as typeof import("child_process");
const util = require("util") as typeof import("util");

const execAsync = util.promisify(child_process.exec);

const REPO_DIR = path.join("../../assets/langchain", "langchain-google");

async function setupRepository() {
  try {
    if (fs.existsSync(REPO_DIR)) {
      console.log("Repository already exists, skipping clone.");
      return;
    }
    console.log("Cloning repository...");
    await execAsync(`git clone https://github.com/langchain-ai/langchain-google.git ${REPO_DIR}`);
    console.log("Repository cloned successfully");
  } catch (error) {
    console.log("Repository might already exist or clone failed:", error.message);
  }
}

await setupRepository();

Cloning repository...
Repository cloned successfully

This example will focus only on the integration of Gemini API with langchain and ignore the rest of the codebase.

const glob = require("glob") as typeof import("glob");
const textLoader = require("langchain/document_loaders/fs/text") as typeof import("langchain/document_loaders/fs/text");
const textSplitter = require("langchain/text_splitter") as typeof import("langchain/text_splitter");

async function loadDocuments() {
  console.log("Loading documents...");

  const repoPath = path.join(REPO_DIR, "libs/genai/langchain_google_genai");
  const pattern = path.join(repoPath, "**/*.py");

  const files = glob.sync(pattern, { recursive: true });
  console.log(`Found ${files.length} Python files`);

  const splitter = textSplitter.RecursiveCharacterTextSplitter.fromLanguage(
    textSplitter.SupportedTextSplitterLanguages[6], // Python
    {
      chunkSize: 2000,
      chunkOverlap: 0,
    }
  );

  const docs: Awaited<ReturnType<typeof splitter.splitDocuments>> = [];

  for (const file of files) {
    try {
      // Check if file exists and is readable
      if (fs.existsSync(file)) {
        const loader = new textLoader.TextLoader(file);
        const fileDocuments = await loader.load();
        const splitDocs = await splitter.splitDocuments(fileDocuments);
        docs.push(...splitDocs);
        console.log(`Processed: ${file} (${splitDocs.length} chunks)`);
      }
    } catch (error) {
      console.warn(`Failed to load ${file}:`, error.message);
    }
  }

  console.log(`Total documents loaded: ${docs.length}`);
  return docs;
}

Each file with a matching path will be loaded and split by RecursiveCharacterTextSplitter. In this example, it is specified, that the files are written in Python. It helps split the files without having documents that lack context.

const docs = await loadDocuments();

Loading documents...
Found 11 Python files
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/__init__.py (2 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/_common.py (6 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/_enums.py (1 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/_function_utils.py (16 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/_genai_extension.py (15 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/_image_utils.py (5 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/chat_models.py (59 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/embeddings.py (11 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/genai_aqa.py (3 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/google_vector_store.py (11 chunks)
Processed: ../../assets/langchain/langchain-google/libs/genai/langchain_google_genai/llms.py (5 chunks)
Total documents loaded: 134

SupportedTextSplitterLanguages literal provides common separators used in most popular programming languages, it lowers the chances of classes or functions being split in the middle.

textSplitter.RecursiveCharacterTextSplitter.getSeparatorsForLanguage(textSplitter.SupportedTextSplitterLanguages[6]);

[ '\nclass ', '\ndef ', '\n\tdef ', '\n\n', '\n', ' ', '' ]

Create the database

The data will be loaded into the memory since the database doesn’t need to be permanent in this case and is small enough to fit.

const memory = require("langchain/vectorstores/memory") as typeof import("langchain/vectorstores/memory");
const googleGenerativeAI = require("@langchain/google-genai") as typeof import("@langchain/google-genai");

async function createVectorStore(documents: Awaited<ReturnType<typeof loadDocuments>>) {
  console.log("Creating vector store...");

  const embeddings = new googleGenerativeAI.GoogleGenerativeAIEmbeddings({
    model: "models/text-embedding-004",
    apiKey: GEMINI_API_KEY,
  });

  const vectorStore = await memory.MemoryVectorStore.fromDocuments(documents, embeddings);

  console.log("Vector store created successfully");
  return vectorStore;
}

Everything needed is ready, and now you can create the database. It should not take longer than a few seconds.

const vectorStore = await createVectorStore(docs);

Creating vector store...
Vector store created successfully

Question Answering

Set-up the document retriever.

const retriever = vectorStore.asRetriever({
  k: 20, // number of documents to return
  searchType: "similarity", // cosine distance
});

const llm = new googleGenerativeAI.ChatGoogleGenerativeAI({
  model: MODEL_ID,
  apiKey: GEMINI_API_KEY,
});

Now, you can create a chain for Question Answering. In this case, RetrievalQA chain will be used.

If you want to use the chat option instead, use ConversationalRetrievalChain.

const chains = require("langchain/chains") as typeof import("langchain/chains");

// eslint-disable-next-line @typescript-eslint/no-deprecated
const qaChain = chains.RetrievalQAChain.fromLLM(llm, retriever);

The chain is ready to answer your questions.

async function callQAChain(prompt: string) {
  try {
    const response = await qaChain.invoke({
      query: prompt,
    });
    tslab.display.markdown(`**Answer:**\n\n${response.text}`);
  } catch (error) {
    console.error("Error calling QA chain:", error);
  }
}

await callQAChain("Show hierarchy for _BaseGoogleGenerativeAI. Do not show content of classes.");

Answer:

BaseModel
  └── _BaseGoogleGenerativeAI
      ├── ChatGoogleGenerativeAI
      └── GoogleGenerativeAI

await callQAChain("What is the return type of embedding models?");

Answer:

The return type of embedding models depends on whether a single text or a list of texts is being embedded:

For embedding a single text (e.g., embed_query or aembed_query), the return type is a List[float].
For embedding a list of texts (e.g., embed_documents or aembed_documents), the return type is a List[List[float]], where each inner list is the embedding for a corresponding text.

await callQAChain("What classes are related to Attributed Question and Answering.");

Answer:

The classes related to Attributed Question and Answering (AQA) are:

GenAIAqa: This is the main class representing Google’s Attributed Question and Answering service. It’s a RunnableSerializable that takes AqaInput and returns AqaOutput.
AqaInput: A Pydantic model defining the input structure for GenAIAqa.invoke, which includes prompt and source_passages.
AqaOutput: A Pydantic model defining the output structure from GenAIAqa.invoke, which includes the answer, attributed_passages, and answerable_probability.
_AqaModel: An internal wrapper class used by GenAIAqa to interact with the underlying Google Generative Language AQA API.
GroundedAnswer: A dataclass used internally (specifically by _AqaModel) to structure the response received from the generate_answer API call before it’s converted into AqaOutput.
Passage: A dataclass used within GroundedAnswer to represent an individual attributed passage, containing its text and id.
GoogleVectorStore: While not an AQA class itself, it contains an as_aqa method that constructs and returns a Runnable[str, AqaOutput] which includes GenAIAqa, making it a key class for integrating with AQA functionality.

await callQAChain("What are the dependencies of the GenAIAqa class?");

Answer:

The GenAIAqa class has the following dependencies:

langchain_core:
- RunnableSerializable
- RunnableConfig
- Document (indirectly, via _toAqaInput used in as_aqa method from GoogleVectorStore)
- RunnablePassthrough, RunnableLambda (used in as_aqa method)
pydantic:
- BaseModel
- PrivateAttr
google.ai.generativelanguage (aliased as genai):
- GenerativeServiceClient
- AnswerStyle (from GenerateAnswerRequest)
- SafetySetting
Internal modules/classes within langchain_google_genai:
- _genai_extension (aliased as genaix): This module provides functions like build_generative_service, generate_answer, and the GroundedAnswer type.
- _AqaModel (an internal wrapper class)
- AqaInput (its input model)
- AqaOutput (its output model)

Summary

Gemini API works great with Langchain. The integration is seamless and provides an easy interface for:

loading and splitting files
creating an In-memory database with embedding information
answering questions based on context from files

What’s next?

This notebook showed only one possible use case for langchain with Gemini API. You can find many more here.