Getting started with the Gemini API OpenAI compatibility

This example illustrates how to interact with the Gemini API using the OpenAI JS library.

This notebook will walk you through:

More details about this OpenAI compatibility on the documentation.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Donโ€™t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Hereโ€™s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

โ”‚
โ”œโ”€โ”€ .env
โ””โ”€โ”€ quickstarts
    โ””โ”€โ”€ Get_started_OpenAI_Compatibility.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const openai = require("openai") as typeof import("openai");

const ai = new openai.OpenAI({
  apiKey: GEMINI_API_KEY,
  baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
});

Define the Gemini model to be used

You can start by listing the available models using the OpenAI library.

for await (const model of await ai.models.list()) {
  console.log(`Model ID: ${model.id}`);
}
Model ID: models/embedding-gecko-001
Model ID: models/gemini-1.0-pro-vision-latest
Model ID: models/gemini-pro-vision
Model ID: models/gemini-1.5-pro-latest
Model ID: models/gemini-1.5-pro-002
Model ID: models/gemini-1.5-pro
Model ID: models/gemini-1.5-flash-latest
Model ID: models/gemini-1.5-flash
Model ID: models/gemini-1.5-flash-002
Model ID: models/gemini-1.5-flash-8b
Model ID: models/gemini-1.5-flash-8b-001
Model ID: models/gemini-1.5-flash-8b-latest
Model ID: models/gemini-2.5-pro-preview-03-25
Model ID: models/gemini-2.5-flash-preview-04-17
Model ID: models/gemini-2.5-flash-preview-05-20
Model ID: models/gemini-2.5-flash
Model ID: models/gemini-2.5-flash-preview-04-17-thinking
Model ID: models/gemini-2.5-flash-lite-preview-06-17
Model ID: models/gemini-2.5-pro-preview-05-06
Model ID: models/gemini-2.5-pro-preview-06-05
Model ID: models/gemini-2.5-pro
Model ID: models/gemini-2.0-flash-exp
Model ID: models/gemini-2.0-flash
Model ID: models/gemini-2.0-flash-001
Model ID: models/gemini-2.0-flash-exp-image-generation
Model ID: models/gemini-2.0-flash-lite-001
Model ID: models/gemini-2.0-flash-lite
Model ID: models/gemini-2.0-flash-preview-image-generation
Model ID: models/gemini-2.0-flash-lite-preview-02-05
Model ID: models/gemini-2.0-flash-lite-preview
Model ID: models/gemini-2.0-pro-exp
Model ID: models/gemini-2.0-pro-exp-02-05
Model ID: models/gemini-exp-1206
Model ID: models/gemini-2.0-flash-thinking-exp-01-21
Model ID: models/gemini-2.0-flash-thinking-exp
Model ID: models/gemini-2.0-flash-thinking-exp-1219
Model ID: models/gemini-2.5-flash-preview-tts
Model ID: models/gemini-2.5-pro-preview-tts
Model ID: models/learnlm-2.0-flash-experimental
Model ID: models/gemma-3-1b-it
Model ID: models/gemma-3-4b-it
Model ID: models/gemma-3-12b-it
Model ID: models/gemma-3-27b-it
Model ID: models/gemma-3n-e4b-it
Model ID: models/gemma-3n-e2b-it
Model ID: models/embedding-001
Model ID: models/text-embedding-004
Model ID: models/gemini-embedding-exp-03-07
Model ID: models/gemini-embedding-exp
Model ID: models/aqa
Model ID: models/imagen-3.0-generate-002
Model ID: models/imagen-4.0-generate-preview-06-06
Model ID: models/imagen-4.0-ultra-generate-preview-06-06
Model ID: models/veo-2.0-generate-001
Model ID: models/gemini-2.5-flash-preview-native-audio-dialog
Model ID: models/gemini-2.5-flash-exp-native-audio-thinking-dialog
Model ID: models/gemini-2.0-flash-live-001
Model ID: models/gemini-live-2.5-flash-preview

Define the Gemini model to be used

In this example, you will use the gemini-2.5-flash-preview-05-20 model. For more details about the available models, check the Gemini models page from the Gemini API documentation.

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Initial interaction - generate text

For your first request, use the OpenAI SDK to perform text generation with a text prompt.

const textResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "What is generative AI?" },
  ],
});
tslab.display.markdown(textResponse.choices[0].message.content ?? "");

Generative AI is a type of artificial intelligence that can create new, original content, rather than just analyzing or classifying existing data. Unlike traditional AI that might recognize patterns or make predictions based on input (like classifying an image as a โ€œcatโ€), generative AI can produce a brand-new image of a cat that has never existed before, write an original story, compose music, or even generate functional code.

Think of it as an AI that has become a โ€œcreatorโ€ rather than just an โ€œanalyzerโ€ or โ€œdecider.โ€


How Does it Work (Simply)?

  1. Massive Training Data: Generative AI models are trained on enormous datasets of existing content (e.g., millions of images, billions of text documents, hours of audio).
  2. Learning Patterns and Structures: During training, the AI doesnโ€™t just memorize the data; it learns the underlying patterns, relationships, styles, and structures within that data. For example, it learns what makes a sentence grammatically correct, what common objects look like in various settings, or how musical notes combine harmonically.
  3. Generating New Content: Once trained, when given a โ€œpromptโ€ (a text description, an image, or a piece of audio as a starting point), the AI uses its learned understanding to generate something entirely new that aligns with the prompt and the patterns it has observed. It essentially โ€œimaginesโ€ new content based on its extensive knowledge.

What Can Generative AI Create? (Examples)

Generative AI is behind many of the impressive breakthroughs weโ€™ve seen recently:

  • Text:
    • Large Language Models (LLMs) like OpenAIโ€™s ChatGPT, Googleโ€™s Gemini, or Anthropicโ€™s Claude can write articles, stories, poems, emails, summaries, code, answer questions, translate languages, and engage in conversations.
  • Images:
    • Tools like DALL-E, Midjourney, and Stable Diffusion can generate realistic images or artistic illustrations from a text description (e.g., โ€œa futuristic city at sunset with flying carsโ€). They can also modify existing images or create variations.
  • Audio:
    • Generate realistic speech from text (Text-to-Speech), create original music compositions in various styles, or even clone voices.
  • Video:
    • Create short video clips, animate still images, or even generate deepfakes (which raise significant ethical concerns).
  • Code:
    • Generate programming code snippets, complete functions, or even entire software applications based on natural language descriptions.
  • 3D Models:
    • Generate 3D objects, textures, and environments for games, simulations, or design.

Why is it a Big Deal?

  • Democratization of Creativity: It lowers the barrier for individuals to create high-quality content without needing specialized artistic or technical skills.
  • Enhanced Productivity: Automates content creation tasks, saving time and resources for businesses and individuals.
  • Innovation: Opens up entirely new possibilities for design, entertainment, education, and many other fields.
  • Personalization: Can generate highly personalized content tailored to individual preferences.

Challenges and Ethical Considerations:

While powerful, generative AI also presents challenges:

  • Bias: It can perpetuate or amplify biases present in its training data.
  • Misinformation and Deepfakes: The ability to generate realistic fake content (text, images, audio, video) raises concerns about the spread of misinformation and manipulation.
  • Copyright and Ownership: Who owns the content generated by AI? Does it infringe on the copyright of the data it was trained on?
  • Job Displacement: Potential impact on creative industries and jobs.
  • Environmental Impact: Training these massive models requires significant computational power and energy.
  • Hallucinations: LLMs, in particular, can sometimes generate factually incorrect information that sounds plausible.

In summary, generative AI is a revolutionary leap in artificial intelligence, moving from analysis to creation, and fundamentally changing how we interact with and produce digital content.

Generating code

You can work with the Gemini API to generate code for you.

const CODE_PROMPT = `
    Write a C program that takes two IP addresses, representing the start and end of a range
    (e.g., 192.168.1.1 and 192.168.1.254), as input arguments. The program should convert this
    IP address range into the minimal set of CIDR notations that completely cover the given
    range. The output should be a comma-separated list of CIDR blocks.
`;

const codeResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [{ role: "user", content: CODE_PROMPT }],
});
tslab.display.markdown(codeResponse.choices[0].message.content ?? "");

This C program takes two IP addresses (start and end of a range) as command-line arguments. It then calculates and prints the minimal set of CIDR notations that completely cover the given IP address range.

The algorithm works by iteratively finding the largest possible CIDR block that starts at the current IP address and fits entirely within the remaining part of the specified range.

Key Concepts:

  • IP Addresses as Integers: IP addresses are treated as 32-bit unsigned integers. This allows for easy arithmetic operations.
  • Network Byte Order vs. Host Byte Order: IP addresses are typically stored in network byte order. For arithmetic, they need to be converted to host byte order using ntohl() and back to network byte order for inet_ntoa() using htonl().
  • CIDR Block Properties:
    • A /N CIDR block covers 2^(32-N) IP addresses.
    • The starting address of a CIDR block must be โ€œalignedโ€ with its block size. For example, a /24 block (size 256) must start at an address where the last octet is 0. This means start_ip % block_size == 0, or bitwise (start_ip & (block_size - 1)) == 0.
  • Greedy Approach: The program works by repeatedly finding the largest possible CIDR block that begins at the current start_ip and does not exceed the end_ip of the original range. This ensures a minimal set.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <arpa/inet.h> // For inet_addr, inet_ntoa, ntohl, htonl
#include <stdint.h>    // For uint32_t, uint64_t
#include <stdbool.h>   // For bool type
#include <limits.h>    // For UINT32_MAX

// Function to convert an IP string to a host byte order uint32_t
static uint32_t ip_string_to_uint32(const char *ip_str) {
    struct in_addr sa;
    if (inet_pton(AF_INET, ip_str, &(sa)) != 1) {
        fprintf(stderr, "Error: Invalid IP address format: %s\n", ip_str);
        exit(EXIT_FAILURE);
    }
    return ntohl(sa.s_addr);
}

// Function to convert a host byte order uint32_t to an IP string
// Note: inet_ntoa uses a static buffer, so the return value is valid
// only until the next call. For simple printing, it's fine.
static const char* uint32_to_ip_string(uint32_t ip_val) {
    struct in_addr sa;
    sa.s_addr = htonl(ip_val);
    return inet_ntoa(sa);
}

int main(int argc, char *argv[]) {
    if (argc != 3) {
        fprintf(stderr, "Usage: %s <start_ip> <end_ip>\n", argv[0]);
        fprintf(stderr, "Example: %s 192.168.1.1 192.168.1.254\n", argv[0]);
        return EXIT_FAILURE;
    }

    uint32_t start_ip_val = ip_string_to_uint32(argv[1]);
    uint32_t end_ip_val = ip_string_to_uint32(argv[2]);

    if (start_ip_val > end_ip_val) {
        fprintf(stderr, "Error: Start IP (%s) cannot be greater than End IP (%s).\n", argv[1], argv[2]);
        return EXIT_FAILURE;
    }

    uint32_t current_ip = start_ip_val;
    bool first_cidr = true;

    while (current_ip <= end_ip_val) {
        int best_prefix_len = -1;
        
        // Iterate from the smallest block (longest prefix /32) to largest block (shortest prefix /0)
        // We are looking for the largest possible block that fits.
        // So we iterate prefix_len from 32 down to 0. The first one we find is the largest.
        for (int prefix_len = 32; prefix_len >= 0; prefix_len--) {
            // Calculate the size of the block for this prefix length.
            // Use uint64_t for block_size to avoid overflow when prefix_len is 0 (2^32)
            uint64_t block_size = 1ULL << (32 - prefix_len);

            // 1. Check if the current_ip is aligned for this prefix_len.
            // For a block to start at current_ip with a /prefix_len, current_ip must be a multiple of block_size.
            // This is equivalent to (current_ip & (block_size - 1)) == 0.
            // This bitwise check works correctly even for prefix_len 0 (block_size 2^32),
            // as (2^32 - 1) is 0xFFFFFFFF, so (current_ip & 0xFFFFFFFF) == 0 becomes current_ip == 0.
            if ((current_ip & (block_size - 1)) != 0) {
                continue; // Not aligned for this block size
            }

            // 2. Check if this potential block fits within the remaining target range.
            // The last IP in this potential block is current_ip + block_size - 1.
            // We need: (current_ip + block_size - 1) <= end_ip_val.
            // To prevent overflow when calculating (current_ip + block_size - 1):
            // Check if (block_size - 1) is too large for current_ip + (block_size - 1) to fit in uint32_t.
            // This check (current_ip > UINT32_MAX - (block_size - 1)) is crucial.
            if (block_size - 1 > UINT32_MAX - current_ip) { // check if current_ip + block_size - 1 would overflow
                // If it would overflow, it definitely extends beyond end_ip_val (unless end_ip_val is also near UINT32_MAX)
                // In such a case, this block can't be fully contained if max_ip is UINT32_MAX.
                // It means the block extends beyond the maximum representable IP address.
                // So, it cannot be contained within end_ip_val unless end_ip_val is UINT32_MAX
                // AND the block starts at 0. A simpler check might be just
                // if (current_ip + block_size -1 > end_ip_val)
                // but that can overflow.

                // Safer check: Does `current_ip + (block_size - 1)` extend beyond `end_ip_val`?
                // This is equivalent to `block_size - 1 > end_ip_val - current_ip`.
                // Use uint64_t to prevent overflow for `end_ip_val - current_ip`
                if (block_size > (uint64_t)end_ip_val - current_ip + 1) { // block_size is too large for the remaining range
                    continue;
                }
            } else {
                // Now we can safely calculate block_end as a uint32_t
                uint32_t block_end = current_ip + (uint32_t)(block_size - 1);
                if (block_end > end_ip_val) {
                    continue; // Block extends beyond the end of our target range
                }
            }

            // If we reached here, this prefix_len represents a valid block that
            // starts at current_ip and fits within the remaining range.
            // Since we iterate prefix_len downwards, this is the largest such block.
            best_prefix_len = prefix_len;
            break; // Found the largest block, so break the inner loop
        }

        if (best_prefix_len == -1) {
            // This should ideally not happen if current_ip <= end_ip_val,
            // as a /32 block should always be possible.
            fprintf(stderr, "Internal Error: Could not find a suitable CIDR block for IP %s\n", uint32_to_ip_string(current_ip));
            return EXIT_FAILURE;
        }

        // Print the found CIDR block
        if (!first_cidr) {
            printf(",");
        }
        printf("%s/%d", uint32_to_ip_string(current_ip), best_prefix_len);
        first_cidr = false;

        // Advance current_ip to the IP address immediately after the processed block
        uint64_t processed_block_size = 1ULL << (32 - best_prefix_len);
        if (current_ip > UINT32_MAX - processed_block_size) {
            // This means advancing current_ip would cause an overflow,
            // implying we've covered addresses up to UINT32_MAX.
            break; 
        }
        current_ip += (uint32_t)processed_block_size;
    }
    printf("\n");

    return EXIT_SUCCESS;
}

How to Compile and Run:

  1. Save: Save the code as ip_range_to_cidr.c (or any other .c extension).
  2. Compile: Use a C compiler like GCC: bash gcc ip_range_to_cidr.c -o ip_range_to_cidr (You might need to include -Wall -Wextra -std=c11 for better warnings and C11 standard compliance, though not strictly necessary for this code to compile.)
  3. Run: bash ./ip_range_to_cidr 192.168.1.1 192.168.1.254

Example Outputs:

  • Input: 192.168.1.1 192.168.1.254

    Output: 192.168.1.1/32,192.168.1.2/31,192.168.1.4/30,192.168.1.8/29,192.168.1.16/28,192.168.1.32/27,192.168.1.64/26,192.168.1.128/25,192.168.1.252/30,192.168.1.254/31 (This output demonstrates the greedy algorithm finding larger blocks where possible, then breaking down remaining parts into smaller blocks. It covers 192.168.1.1 to 192.168.1.255, excluding 192.168.1.255 which is not included in the range)

  • Input: 192.168.1.0 192.168.1.255

    Output: 192.168.1.0/24

  • Input: 10.0.0.0 10.255.255.255

    Output: 10.0.0.0/8

  • Input: 192.168.0.1 192.168.0.1

    Output: 192.168.0.1/32

  • Input: 0.0.0.0 255.255.255.255

    Output: 0.0.0.0/0

Important Notes:

  • inet_ntoa() is considered deprecated and not thread-safe because it uses a static internal buffer. For more robust applications, inet_ntop() should be used, which requires providing a buffer. For this simple command-line utility, itโ€™s generally acceptable.
  • The use of uint64_t for block_size and intermediate calculations involving large IP address ranges (especially for /0 which represents 2^32 addresses) is crucial to prevent integer overflow.
  • Error handling for invalid IP addresses and range order is included.

Multimodal interactions

Gemini models are able to process different data modatilities, such as unstructured files, images, audio and videos, allowing you to experiment with multimodal scenarios where you can ask the model to describe, explain, get insights or extract information out of those multimedia information included into your prompts. In this section you will work across different senarios with multimedia information.

Important

The OpenAI SDK compatibility only supports inline images and audio files. For videos support, use the Gemini APIโ€™s JS SDK.

Working with images (a single image)

You will first download the image you want to work with.

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");

const downloadFile = async (url: string, filePath: string) => {
  const response = await fetch(url);
  if (!response.ok) {
    throw new Error(`Failed to download file: ${response.statusText}`);
  }
  fs.mkdirSync(path.dirname(filePath), { recursive: true });
  const buffer = await response.blob();
  const bufferData = Buffer.from(await buffer.arrayBuffer());
  fs.writeFileSync(filePath, bufferData);
};
const IMAGE_URL = "https://storage.googleapis.com/generativeai-downloads/images/Japanese_Bento.png";

const imageFilePath = path.join("../assets/openai_compatibility", "Japanese_Bento.png");
await downloadFile(IMAGE_URL, imageFilePath);

tslab.display.png(fs.readFileSync(imageFilePath));

Now you can encode the image and work with the OpenAI library to interact with the Gemini models.

const base64Image = fs.readFileSync(imageFilePath).toString("base64");

const imageResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Describe the items on this image. If there is any non-English text, translate it as well",
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${base64Image}`,
          },
        },
      ],
    },
  ],
});
tslab.display.markdown(imageResponse.choices[0].message.content ?? "");

Here are the descriptions of the items on the image, with translations of the Japanese text:

Row 1 (Top Row):

  1. Item 1 (Left): Matcha Swiss Roll
    • Description: A slice of green tea (matcha) Swiss roll cake. It features a vibrant green cake layer rolled around a swirl of white cream, and appears to have small dark bits (possibly red beans or chocolate) embedded in the cake. Itโ€™s served on a small white plate.
    • Japanese Text: ๆŠน่Œถใฎใ‚นใ‚คใ‚นใƒญใƒผใƒซ (Matcha no Suisu Rลru)
    • Translation: Matcha Swiss Roll
  2. Item 2 (Middle): Anpan
    • Description: Two anpan, which are sweet bread rolls filled with red bean paste. One bun is whole and sprinkled with seeds (likely sesame or poppy), while the other is cut open to reveal the dark, rich adzuki (red bean) paste filling inside.
    • Japanese Text: ใ‚ใ‚“ใƒ‘ใƒณ (Anpan)
    • Translation: Anpan (Sweet Red Bean Bun)
  3. Item 3 (Right): Saki Ika
    • Description: A pile of saki ika, which is shredded and dried seasoned squid, a popular snack in Japan. The strips are light reddish-orange and have a slightly translucent appearance. Itโ€™s presented on a dark, textured plate.
    • Japanese Text: ใ•ใใ‚คใ‚ซ (Saki Ika)
    • Translation: Shredded Dried Squid

Row 2 (Middle Row):

  1. Item 4 (Left): Umeboshi
    • Description: A small serving of umeboshi, which are intensely sour and salty pickled Japanese plums. There are several round, wrinkly, reddish-orange plums in a small, dark bowl.
    • Japanese Text: ๆข…ๅนฒใ— (Umeboshi)
    • Translation: Pickled Plums
  2. Item 5 (Middle): Taiyaki
    • Description: Two taiyaki, which are fish-shaped cakes typically filled with red bean paste. They have a golden-brown, waffle-like texture and are molded into the shape of a sea bream. They are served on a dark plate.
    • Japanese Text: ใŸใ„็„ผใ (Taiyaki)
    • Translation: Taiyaki (Fish-shaped cake)
  3. Item 6 (Right): Azuki Monaka
    • Description: A monaka, a traditional Japanese sweet consisting of adzuki (red bean) paste sandwiched between two thin, crisp wafers. One of the wafers has an intricate, patterned design, and the sweet is shown with the top wafer removed to reveal the dark, chunky red bean paste filling. Itโ€™s on a dark plate.
    • Japanese Text: ใ‚ใšใๆœ€ไธญ (Azuki Monaka)
    • Translation: Azuki (Red Bean) Monaka

Row 3 (Bottom Row):

  1. Item 7 (Left): Onigiri
    • Description: Two onigiri, or Japanese rice balls, shaped into triangles. Each is partially wrapped with a strip of dark green nori (seaweed). They are served on a blue and white patterned plate, accompanied by a few slices of yellow pickled vegetables (likely takuan, pickled daikon radish).
    • Japanese Text: ใŠๆกใ‚Š (Onigiri)
    • Translation: Rice Ball
  2. Item 8 (Middle): Sakuramochi
    • Description: Two sakuramochi, which are traditional Japanese sweets made from pink-colored mochi (rice cake) and typically filled with red bean paste. Each is wrapped in a pickled cherry blossom (sakura) leaf. They are presented on a black and red lacquerware plate.
    • Japanese Text: ๆกœ้ค… (Sakuramochi)
    • Translation: Cherry Blossom Mochi
  3. Item 9 (Right): Nori Maki Senbei
    • Description: A pile of nori maki senbei, which are Japanese rice crackers. These are small, cylindrical, orange-colored crackers, each individually wrapped with a strip of dark green nori seaweed.
    • Japanese Text: ๆตท่‹”ๅทปใ็…Ž้ค… (Nori Maki Senbei)
    • Translation: Nori-Wrapped Rice Crackers

Working with images (multiple images)

You can do the same process while sending multiple images into the same prompt.

const IMAGE_URLS = [
  "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg",
  "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg",
  "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg",
  "https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg",
];
const imageFilePaths = IMAGE_URLS.map((url, index) => {
  const fileName = `furniture_${index + 1}.jpg`;
  return path.join("../assets/openai_compatibility", fileName);
});

await Promise.all(IMAGE_URLS.map((url, index) => downloadFile(url, imageFilePaths[index])));
imageFilePaths.forEach((filePath) => {
  tslab.display.jpeg(fs.readFileSync(filePath));
});

Now you can encode the images and send them with your prompt.

import OpenAI from "openai";

const imageBase64s = imageFilePaths.map((filePath) => fs.readFileSync(filePath).toString("base64"));

const multiImageResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Describe for what type of living room each of those items are the best match",
        },
        ...imageBase64s.map(
          (base64) =>
            ({
              type: "image_url",
              image_url: {
                url: `data:image/jpeg;base64,${base64}`,
              },
            }) as OpenAI.ChatCompletionContentPartImage
        ),
      ],
    },
  ],
});
tslab.display.markdown(multiImageResponse.choices[0].message.content ?? "");

Hereโ€™s a description of the type of living room each item would best match, based on its design and aesthetic:


1. Industrial Adjustable Stool (Image 1)

  • Description: This stool features a warm, finished wooden seat and a dark, possibly metal, base with an exposed screw mechanism for height adjustment. Its design is sturdy, utilitarian, and has a raw, functional appeal.
  • Best Match Living Room: This stool is an excellent fit for Industrial, Loft, Urban Chic, or Rustic/Farmhouse living rooms. Its combination of natural wood and dark metal, along with the visible mechanism, aligns perfectly with the exposed brick, concrete, reclaimed wood, and metallic accents often found in industrial designs. It can serve as an extra seat, a unique side table for a drink, or a display stand for a plant or sculpture, adding an authentic, edgy touch.

2. Tufted Upholstered Armchair (Image 2)

  • Description: This armchair is upholstered in a light, possibly cream or off-white fabric, featuring deep button tufting on the back and arms, rolled armrests, and decorative turned legs. It exudes a sense of elegance, comfort, and classic charm.
  • Best Match Living Room: This chair is ideal for Traditional, Glam, Shabby Chic, French Country, or Neoclassical living rooms. Its ornate details, plush upholstery, and classic silhouette speak to a desire for comfort and sophisticated aesthetics. It would complement spaces with intricate mouldings, crystal chandeliers, antique furniture, soft drapery, and a luxurious, inviting atmosphere. It functions beautifully as a statement accent chair or part of a formal seating arrangement.

3. Simple Wooden Bar/Counter Stool (Image 3)

  • Description: This is a tall, simple wooden stool with a square top and straight legs, likely made of light or whitewashed wood. Its design is minimalist, clean-lined, and highly functional without any superfluous decoration.
  • Best Match Living Room: This stool is perfectly suited for Scandinavian, Modern, Minimalist, Contemporary, or Casual Coastal living rooms. Its light wood tone and uncomplicated form align with the Scandinavian emphasis on natural materials, functionality, and bright, airy spaces. In a modern or minimalist setting, it offers practical extra seating or a surface without cluttering the visual space. It could also fit well in an open-concept living room that incorporates a kitchen island or counter, serving as breakfast bar seating.

4. Modern Lounge Chair with Sculptural Base (Image 4)

  • Description: This low-slung lounge chair features a subtly curved, upholstered seat and back in a neutral fabric (like beige or light grey), supported by a distinctive, sculptural wooden base that appears to have a star-like or splayed design, possibly allowing for a swivel. Its design is sleek, sophisticated, and contemporary.
  • Best Match Living Room: This chair is an excellent match for Modern, Mid-Century Modern, Contemporary, or Scandinavian living rooms. Its clean lines, organic curves, and striking yet understated base are hallmarks of these design movements. Itโ€™s a statement piece that prioritizes both comfort and aesthetic appeal. It would fit well in spaces with minimalist decor, natural textures, geometric patterns, and other iconic modern furniture, serving as a comfortable reading nook chair or a chic accent piece for conversation.

Working with audio files

You can also send audio files on your prompt. Audio data provides a more rich input than text alone, and can be use for tasks like transcription, or as direct prompting like a voice assistant.

First you need to download the audio you want to use.

const AUDIO_URL = "https://storage.googleapis.com/generativeai-downloads/data/Apollo-11_Day-01-Highlights-10s.mp3";

const audioFilePath = path.join("../assets/openai_compatibility", "Apollo-11_Day-01-Highlights-10s.mp3");
await downloadFile(AUDIO_URL, audioFilePath);

tslab.display.html(
  `<audio controls><source src="${audioFilePath}" type="audio/mpeg">Your browser does not support the audio element.</audio>`
);
const audioBase64 = fs.readFileSync(audioFilePath).toString("base64");

const audioResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: "Transcribe this audio file. After transcribing, tell me from what this can be related to.",
        },
        {
          type: "input_audio",
          input_audio: {
            data: audioBase64,
            format: "mp3",
          },
        },
      ],
    },
  ],
});
tslab.display.markdown(audioResponse.choices[0].message.content ?? "");

Here is the transcription of the audio file:

โ€œminus 10, nine, eight. We have a go for main engine start. We have main engine start.โ€


What this can be related to:

This audio can be directly related to a rocket launch or a space mission countdown.

The terminology โ€œminus 10, nine, eight,โ€ followed by โ€œgo for main engine start,โ€ and โ€œmain engine startโ€ are classic phrases used in the procedures leading up to and initiating the launch of a rocket or spacecraft.

Structured outputs

Gemini API allows you to format the way your response you be generated via structured outputs. You can define the structure you want to be used as a defined schema and, using the OpenAI library, you send this structure as the response_format parameter.

In this example you will:

  • download a scientific paper
  • extract its information
  • define the structure you want your response in
  • send your request using the response_format parameter

First you need to download the reference paper. You will use the Attention is all your need Google paper that introduced the Transformers architecture.

const PAPER_URL = "https://arxiv.org/pdf/1706.03762.pdf";
const paperFilePath = path.join("../assets/openai_compatibility", "Attention_is_All_You_Need.pdf");
await downloadFile(PAPER_URL, paperFilePath);

Look at one of the pages:

const getPage = async (pdfPath: string, pageNumber: number): Promise<Uint8Array> => {
  const { pdf } = await import("pdf-to-img");
  const document = await pdf(pdfPath);
  let pageIndex = 0;
  for await (const page of document) {
    if (pageIndex === pageNumber) {
      return new Uint8Array(page);
    }
    pageIndex++;
  }
  throw new Error(`Page ${pageNumber} not found in the PDF document.`);
};

tslab.display.png(await getPage(paperFilePath, 0));

Now let us extract the text from the PDF file. You can use the pdf-parse package to extract the text from the PDF file.

const pdf_parse = require("pdf-parse") as typeof import("pdf-parse");

const pdfData = fs.readFileSync(paperFilePath);
const pdfText = await pdf_parse(pdfData);
console.log(pdfText.text.slice(0, 500)); // Display the first 500 characters of the PDF text


Provided proper attribution is provided, Google hereby grants permission to
reproduce the tables and figures in this paper solely for use in journalistic or
scholarly works.
Attention Is All You Need
Ashish Vaswani
โˆ—
Google Brain
avaswani@google.com
Noam Shazeer
โˆ—
Google Brain
noam@google.com
Niki Parmar
โˆ—
Google Research
nikip@google.com
Jakob Uszkoreit
โˆ—
Google Research
usz@google.com
Llion Jones
โˆ—
Google Research
llion@google.com
Aidan N. Gomez
โˆ— โ€ 
University of Toronto
aidan@cs.toronto.edu

Now you will create your reference structure. You can define the structure you want your response in using the response_format parameter. In this example, you will define a structure that includes the title, authors, abstract, introduction, and conclusion of the paper.

const z = require("zod") as typeof import("zod");

const researchPaperExtractionSchema = z.object({
  title: z.string(),
  authors: z.array(z.string()),
  abstract: z.string(),
  keywords: z.array(z.string()),
});
import { zodResponseFormat } from "openai/helpers/zod";

const SYSTEM_PROMPT = `
    As a specialist in knowledge organization and data refinement, your task is to transform
    raw research paper content into a clearly defined structured format. I will provide you
    with the original, free-form text. Your goal is to parse this text, extract the pertinent
    information, and reconstruct it according to the structure outlined below. Feel free to
    to use any relevant information from the text to fill in the fields.
`;

const structuredResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "system",
      content: SYSTEM_PROMPT,
    },
    {
      role: "user",
      content: pdfText.text,
    },
  ],
  response_format: zodResponseFormat(researchPaperExtractionSchema, "research_paper_extraction"),
});
console.log(JSON.parse(structuredResponse.choices[0].message.content ?? "{}"));
{
  abstract: 'The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles, by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.',
  authors: [
    'Ashish Vaswani',
    'Noam Shazeer',
    'Niki Parmar',
    'Jakob Uszkoreit',
    'Llion Jones',
    'Aidan N. Gomez',
    'ลukasz Kaiser',
    'Illia Polosukhin'
  ],
  keywords: [
    'Transformer',
    'Attention Mechanism',
    'Self-attention',
    'Machine Translation',
    'Sequence Transduction',
    'Neural Networks'
  ],
  title: 'Attention Is All You Need'
}

Given the Gemini API ability to handle structured outputs, you can work in more complex scenarios too - like using the structured output functionality to help you generating user interfaces.

enum UITYpe {
  DIV = "div",
  BUTTON = "button",
  HEADER = "header",
  SECTION = "section",
  FIELD = "field",
  FORM = "form",
}

const attributesSchema = z.object({
  name: z.string(),
  value: z.string(),
});

const uiSchema = z.object({
  type: z.nativeEnum(UITYpe),
  label: z.string(),
  children: z.array(z.string()),
  attributes: z.array(attributesSchema),
});

const uiResponseSchema = z.object({
  ui: uiSchema,
});
const uiResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "system",
      content: "You are a UI generation assistant. Convert the user input into a UI.",
    },
    {
      role: "user",
      content: "Make a User Profile Form including all required attributes",
    },
  ],
  response_format: zodResponseFormat(uiResponseSchema, "ui_response"),
});
console.log(JSON.parse(uiResponse.choices[0].message.content ?? "{}"));
{
  ui: {
    attributes: [],
    children: [
      'First Name',
      'Last Name',
      'Email Address',
      'Phone Number',
      'Street Address',
      'City',
      'State',
      'Zip Code',
      'Save Changes'
    ],
    label: 'User Profile',
    type: 'form'
  }
}

Developing with the Gemini API Function Calling

The Gemini APIโ€™s function calling feature allows you to extend the modelโ€™s capabilities by providing descriptions of external functions or APIs.

For further understanding of how function calling works with Gemini models, check the Gemini API documentation.

import OpenAI from "openai";
import { zodFunction } from "openai/helpers/zod";

const getWeatherArgs = z.object({
  location: z.string().describe("The location for which to get the weather"),
});

const tools: OpenAI.ChatCompletionTool[] = [
  zodFunction({
    name: "get_weather",
    description: "Get the current weather for a given location",
    parameters: getWeatherArgs,
  }),
];
const toolResponse = await ai.chat.completions.create({
  model: MODEL_ID,
  messages: [
    {
      role: "user",
      content: "What's the weather like in Boston?",
    },
  ],
  tools: tools,
});
const toolCall = toolResponse.choices[0].message.tool_calls?.[0];
if (toolCall) {
  console.log(JSON.stringify(toolCall, null, 2));
}
{
  "function": {
    "arguments": "{\"location\":\"Boston\"}",
    "name": "get_weather"
  },
  "id": "",
  "type": "function"
}

Thinking

Gemini 2.5 models are trained to think through complex problems, leading to significantly improved reasoning. The Gemini API comes with a โ€œthinking budgetโ€ parameter which gives fine grain control over how much the model will think.

Unlike the Gemini API, the OpenAI API offers three levels of thinking control: โ€œlowโ€, โ€œmediumโ€, and โ€œhighโ€, which are mapped to 1K, 8K, and 24K thinking token budgets.

If you want to disable thinking, you can set the reasoning effort to โ€œnoneโ€.

const THINKING_MODEL = "gemini-2.5-flash";

const thinkingResponse = await ai.chat.completions.create({
  model: THINKING_MODEL,
  reasoning_effort: "low",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: " What is 45-78+5x13? Double check and explain why your answer is correct." },
  ],
});
tslab.display.markdown(thinkingResponse.choices[0].message.content ?? "");

To solve the expression 45 - 78 + 5 x 13, we need to follow the order of operations, often remembered by the acronym PEMDAS/BODMAS:

  1. Parentheses (or Brackets)
  2. Exponents (or Orders)
  3. Multiplication and Division (from left to right)
  4. Addition and Subtraction (from left to right)

Letโ€™s break it down:

Step 1: Perform Multiplication There is a multiplication operation: 5 x 13 5 x 13 = 65

Now, substitute this value back into the expression: 45 - 78 + 65

Step 2: Perform Addition and Subtraction from Left to Right First, 45 - 78: 45 - 78 = -33

Now, substitute this value back into the expression: -33 + 65

Finally, perform the addition: -33 + 65 = 32

Answer: The answer is 32.


Double Check and Explanation:

  1. Understanding Order of Operations (PEMDAS/BODMAS): This rule dictates which operations to perform first in a mathematical expression.
    • 45 - 78 + 5 x 13
    • We first look for Parentheses/Brackets (none).
    • Then Exponents/Orders (none).
    • Next, Multiplication/Division. We have 5 x 13.
      • 5 x 13 = 65.
      • The expression becomes 45 - 78 + 65.
    • Finally, Addition/Subtraction. These are done from left to right.
      • First, 45 - 78. This gives -33.
      • The expression is now -33 + 65.
      • Finally, -33 + 65 = 32.
  2. Re-calculation:
    • Start: 45 - 78 + 5 x 13
    • Multiplication: 5 * 13 = 65
    • Substitute: 45 - 78 + 65
    • Subtraction (left to right): 45 - 78 = -33
    • Addition (left to right): -33 + 65 = 32

The steps are consistent and the re-calculation confirms the result. The answer is correct because we strictly adhered to the mathematical order of operations, performing multiplication before addition and subtraction, and then performing addition and subtraction from left to right.

Generating and working with embeddings

Text embeddings offer a compressed, vector-based representation of text, typically in a lower-dimensional space. The core principle is that semantically similar texts will have embeddings that are spatially proximate within the embedding vector space. This representation enables solutions to several prevalent NLP challenges, including:

  • Semantic Search: Identifying and ranking texts based on semantic relatedness.
  • Recommendation: Suggesting items whose textual descriptions exhibit semantic similarity to a given input text.
  • Classification: Assigning text to categories based on the semantic similarity between the text and the categoryโ€™s representative text.
  • Clustering: Grouping texts into clusters based on the semantic similarity reflected in their respective embedding vectors.
  • Outlier Detection: Identifying texts that are semantically dissimilar from the majority, as indicated by their distance in the embedding vector space.

For more details about working with the Gemini API and embeddings, check the API documentation.

In this example you will use the gemini-embedding-exp-03-07 model from the Gemini API to generate your embeddings.

const EMBEDDINGS_MODEL = "gemini-embedding-exp-03-07";

const embeddingResponse = await ai.embeddings.create({
  model: EMBEDDINGS_MODEL,
  input: "The quick brown fox jumps over the lazy dog.",
});
console.log("Length of embedding vector: ", embeddingResponse.data[0].embedding.length);
const vector = embeddingResponse.data[0].embedding;
console.log("First 10 elements of the embedding vector: ", vector.slice(0, 10));
Length of embedding vector:  3072
First 10 elements of the embedding vector:  [
  -0.013846100308001041,
  0.016324089840054512,
  -0.0018839917611330748,
  -0.04919512942433357,
  0.009635014459490776,
  0.011142290197312832,
  -0.0064404308795928955,
  -0.0022762261796742678,
  0.014301184564828873,
  0.021092839539051056
]

A simple application of text embeddings is to calculate the similarity between sentences (ie. product reviews, documents contents, etc). First you will create a group of sentences.

const danfojs = require("danfojs-node") as typeof import("danfojs-node");

const rows = [
  "i really enjoyed the movie last night",
  "so many amazing cinematic scenes yesterday",
  "had a great time writing my Python scripts a few days ago",
  "huge sense of relief when my .py script finally ran without error",
  "O Romeo, Romeo, wherefore art thou Romeo?",
];

const df = new danfojs.DataFrame(
  rows.map((text) => [text]),
  { columns: ["text"] }
);
df.print();
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘            โ”‚ text              โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 0          โ”‚ i really enjoyeโ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 1          โ”‚ so many amazingโ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 2          โ”‚ had a great timโ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 3          โ”‚ huge sense of rโ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 4          โ”‚ O Romeo, Romeo,โ€ฆ  โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now you can create a function to generate embeddings, apply that function to your dataframe text column and save it into a new column called embeddings.

const generateEmbeddings = async (texts: string[]): Promise<number[][]> => {
  const embeddings = await ai.embeddings.create({
    model: EMBEDDINGS_MODEL,
    input: texts,
  });
  return embeddings.data.map((e) => e.embedding);
};

// eslint-disable-next-line @typescript-eslint/no-unsafe-member-access
const embeddings = await generateEmbeddings(df.text.values as string[]);
const embeddingSeries = new danfojs.Series(embeddings);
df.addColumn("embedding", embeddingSeries, { inplace: true });
df.print();
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘            โ”‚ text              โ”‚ embedding         โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 0          โ”‚ i really enjoyeโ€ฆ  โ”‚ -0.015758849680โ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 1          โ”‚ so many amazingโ€ฆ  โ”‚ -0.012008255347โ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 2          โ”‚ had a great timโ€ฆ  โ”‚ 0.0065507469698โ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 3          โ”‚ huge sense of rโ€ฆ  โ”‚ -0.008062172681โ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ 4          โ”‚ O Romeo, Romeo,โ€ฆ  โ”‚ -0.023231310769โ€ฆ  โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Now that you have the embeddings representations for all sentences, you can calculate their similarities.

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, value, index) => sum + value * b[index], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, value) => sum + value * value, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, value) => sum + value * value, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

const cosineSimilarityArray = (embeddings: number[][]): number[][] => {
  const result: number[][] = [];
  for (let i = 0; i < embeddings.length; i++) {
    result[i] = [];
    for (let j = 0; j < embeddings.length; j++) {
      result[i][j] = cosineSimilarity(embeddings[i], embeddings[j]);
    }
  }
  return result;
};
const cosSimArray = cosineSimilarityArray(embeddings);
// eslint-disable-next-line @typescript-eslint/no-unsafe-member-access
const texts = df.text.values as string[];

const cosSimDf = new danfojs.DataFrame(cosSimArray, {
  columns: texts,
  index: texts,
});
cosSimDf.print();
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•คโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘            โ”‚ i really enjoyeโ€ฆ  โ”‚ so many amazingโ€ฆ  โ”‚ had a great timโ€ฆ  โ”‚ huge sense of rโ€ฆ  โ”‚ O Romeo, Romeo,โ€ฆ  โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ i really e โ”‚ 1                 โ”‚ 0.6536475251204โ€ฆ  โ”‚ 0.6451563604059โ€ฆ  โ”‚ 0.6026055751927โ€ฆ  โ”‚ 0.5086438505027โ€ฆ  โ•‘
โ•‘ njoyed the โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ movie last โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ night      โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ so many am โ”‚ 0.6536475251204โ€ฆ  โ”‚ 0.9999999999999โ€ฆ  โ”‚ 0.6495669715220โ€ฆ  โ”‚ 0.6111587144454โ€ฆ  โ”‚ 0.5282164440831โ€ฆ  โ•‘
โ•‘ azing cine โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ matic scen โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ es yesterd โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ay         โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ had a grea โ”‚ 0.6451563604059โ€ฆ  โ”‚ 0.6495669715220โ€ฆ  โ”‚ 1.0000000000000โ€ฆ  โ”‚ 0.8108772516390โ€ฆ  โ”‚ 0.4761749114691โ€ฆ  โ•‘
โ•‘ t time wri โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ting my Py โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ thon scrip โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ts a few d โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ays ago    โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ huge sense โ”‚ 0.6026055751927โ€ฆ  โ”‚ 0.6111587144454โ€ฆ  โ”‚ 0.8108772516390โ€ฆ  โ”‚ 1.0000000000000โ€ฆ  โ”‚ 0.4621836049587โ€ฆ  โ•‘
โ•‘ of relief  โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ when my .p โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ y script f โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ inally ran โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ without er โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ror        โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•Ÿโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ข
โ•‘ O Romeo, R โ”‚ 0.5086438505027โ€ฆ  โ”‚ 0.5282164440831โ€ฆ  โ”‚ 0.4761749114691โ€ฆ  โ”‚ 0.4621836049587โ€ฆ  โ”‚ 1.0000000000000โ€ฆ  โ•‘
โ•‘ omeo, wher โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ efore art  โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ thou Romeo โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•‘ ?          โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ”‚                   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•งโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

You can also plot it for a better visualization.

import { createCanvas } from "canvas";

const matrix: number[][] = cosineSimilarityArray(embeddings);
// eslint-disable-next-line @typescript-eslint/no-unsafe-member-access
const labels: string[] = df.text.values as string[];

const size = 600;
const padding = 100;
const cellSize = (size - padding) / matrix.length;

const canvas = createCanvas(size, size);
const ctx = canvas.getContext("2d");

function similarityToColor(value: number) {
  const intensity = Math.floor(255 * value);
  return `rgb(${255 - intensity}, ${255 - intensity}, 255)`; // light to dark blue
}

for (let y = 0; y < matrix.length; y++) {
  for (let x = 0; x < matrix.length; x++) {
    const sim = matrix[y][x];
    ctx.fillStyle = similarityToColor(sim);
    ctx.fillRect(padding + x * cellSize, padding + y * cellSize, cellSize, cellSize);
    ctx.fillStyle = "black";
    ctx.font = "10px Arial";
    ctx.fillText(sim.toFixed(2), padding + x * cellSize + 5, padding + y * cellSize + 15);
  }
}

ctx.font = "10px Arial";
ctx.fillStyle = "black";
ctx.textAlign = "right";

labels.forEach((label, i) => {
  ctx.save();
  ctx.translate(padding - 5, padding + i * cellSize + cellSize / 2);
  ctx.fillText(label.slice(0, 20), 0, 5);
  ctx.restore();
});

labels.forEach((label, i) => {
  ctx.save();
  ctx.translate(padding + i * cellSize + cellSize / 2, padding - 5);
  ctx.rotate(-Math.PI / 2);
  ctx.textAlign = "right";
  ctx.fillText(label.slice(0, 20), 0, 5);
  ctx.restore();
});

tslab.display.html(`<img src="${canvas.toDataURL()}" alt="Cosine Similarity Matrix">`);
Cosine Similarity Matrix

Next Steps

Do more with Gemini

If you want to use more of the Gemini capabilities and especially its unique capabilities not available through the OpenAI compatibility, you should check out the Google GenAI SDK.

The Cookbook is full of examples on how to use it but it is recommended to start with the Getting started notebook to get a feel of all the models and SDK capabilities.