You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.
With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.
const google =require("@google/genai") astypeofimport("@google/genai");const ai =new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });
Select a model
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).
const images = fs.readdirSync(`${EXTRACT_PATH}/clothes-dataset`).map((file) => path.join(EXTRACT_PATH,"clothes-dataset", file)).sort((a, b) => b.localeCompare(a));console.log(`Found ${images.length} images in the dataset. ${images}`);
Found 10 images in the dataset. ../assets/tag_and_caption_images/clothes-dataset/9.jpg,../assets/tag_and_caption_images/clothes-dataset/8.jpg,../assets/tag_and_caption_images/clothes-dataset/7.jpg,../assets/tag_and_caption_images/clothes-dataset/6.jpg,../assets/tag_and_caption_images/clothes-dataset/5.jpg,../assets/tag_and_caption_images/clothes-dataset/4.jpg,../assets/tag_and_caption_images/clothes-dataset/3.jpg,../assets/tag_and_caption_images/clothes-dataset/2.jpg,../assets/tag_and_caption_images/clothes-dataset/10.jpg,../assets/tag_and_caption_images/clothes-dataset/1.jpg
Generating keywords
You can use the LLM to extract relevant keywords from the images.
Here is a helper function for calling Gemini API with images. Sleep is for ensuring that the quota is not exceeded. Refer to our princing page for current quotas.
Go ahead and define a prompt that will help define keywords that describe clothing. In the following prompt, few-shot prompting is used to prime the LLM with examples of how these keywords should be generated and which are valid.
const KEYWORD_PROMPT =` You are an expert in clothing that specializes in tagging images of clothes, shoes, and accessories. Your job is to extract all relevant keywords from a photo that will help describe an item. You are going to see an image, extract only the keywords for the clothing, and try to provide as many keywords as possible. Allowed keywords: ${keywords.join(", ")}. Extract tags only when it is obvious that it describes the main item in the image. Return the keywords as a list of strings: example1: ["blue", "shoes", "denim"] example2: ["sport", "skirt", "cotton", "blue", "red"]`;
Generate keywords for each of the images.
for (const image of images.slice(0,5)) {const keywords =awaitgenerateKeywords(KEYWORD_PROMPT, image); tslab.display.jpeg(fs.readFileSync(image));console.log(`Keywords for ${path.basename(image)}: ${keywords}`);}
Keywords for 9.jpg: ["shorts", "denim", "blue", "casual", "summer", "spring", "cotton"]
Keywords for 8.jpg: ["blue", "suit", "shirt", "men", "elegant", "white", "black"]
Keywords for 7.jpg: ["suit", "men", "elegant", "blue", "black", "silk"]
Keywords for 6.jpg: ["T-shirt", "dress", "casual"]
Keywords for 5.jpg: ["dress", "women", "elegant", "red", "spring", "polyester"]
Keyword correction and deduplication
Unfortunately, despite providing a list of possible keywords, the model, at least in theory, can return an invalid keyword. It may be a duplicate e.g. “denim” for “jeans”, or be completely unrelated to any keyword from the list.
To address these issues, you can use embeddings to map the keywords to predefined ones and remove unrelated ones.
For demonstration purposes, define a function that assesses the similarity between two embedding vectors. In this case, you will use cosine similarity, but other measures such as dot product work too.
functioncosineSimilarity(array1:number[], array2:number[]):number {const dotProduct = array1.reduce((sum, value, index) => sum + value * array2[index],0);const norm1 =Math.sqrt(array1.reduce((sum, value) => sum + value * value,0));const norm2 =Math.sqrt(array2.reduce((sum, value) => sum + value * value,0));return dotProduct / (norm1 * norm2);}
Next, define a function that allows you to replace a keyword with the most similar word in the keyword dataframe that you have previously created.
Note that the threshold is decided arbitrarily, it may require tweaking depending on use case and dataset.
/* eslint-disable @typescript-eslint/no-unsafe-member-access, @typescript-eslint/no-unsafe-call */import*as danfo from"danfojs-node";asyncfunctionreplaceWordWithMostSimilar( keyword:string, keywordsDf: danfo.DataFrame, threshold =0.7):Promise<string|null> {// No need for embeddings if the keyword is valid.if (keywordsDf.keyword.values.includes(keyword)) {return keyword; }const embedding =await ai.models.embedContent({ model: EMBEDDING_MODEL_ID, contents: [keyword], config: { taskType:"semantic_similarity", }, });const similarities = keywordsDf.embedding.values.map((rowEmbedding:string) =>cosineSimilarity(embedding.embeddings[0].values!, rowEmbedding.split(",").map(Number)) ) asnumber[];const mostSimilarKeywordIndex = similarities.indexOf(Math.max(...similarities));if (similarities[mostSimilarKeywordIndex] < threshold) {returnnull; }return keywordsDf.loc({ rows: [mostSimilarKeywordIndex], columns: ["keyword"], }).keyword.values[0] asstring;}
Here is an example of how these keywords can be mapped to a keyword with the closest meaning.
for (const word of ["purple","tank top","everyday"]) {const similarWord =awaitreplaceWordWithMostSimilar(word, keywordsDf);console.log(`${word} -> ${similarWord}`);}
purple -> violet
tank top -> T-shirt
everyday -> casual
You can now either leave words that do not fit our predefined categories or delete them. In this scenario, all words without a suitable replacement will be omitted.
const CAPTION_PROMPT =` You are an expert in clothing that specializes in describing images of clothes, shoes and accessories. Your job is to extract information from a photo that will help describe an item. You are going to see an image, focus only on the piece of clothing, ignore suroundings. Be specific, but stay concise, the description should only be one sentence long. Most important aspects are color, type of clothing, material, style and who is it meant for. If you are not sure about a part of the image, ignore it.`;
for (const image of images.slice(0,5)) {const caption =awaitgenerateKeywords(CAPTION_PROMPT, image); tslab.display.jpeg(fs.readFileSync(image));console.log(`Caption for ${path.basename(image)}: ${caption}`);}
Caption for 9.jpg: A pair of distressed men's medium blue denim shorts featuring ripped details and a faded wash.
Caption for 8.jpg: A blue single-breasted men's suit jacket is styled with a white dress shirt, a black tie, and a white pocket square.
Caption for 7.jpg: A men's blue single-breasted tuxedo jacket features black satin shawl lapels, a one-button closure, and black-trimmed flap pockets.
Caption for 6.jpg: An oversized dusty pink cotton jersey t-shirt dress features a classic crew neck and wide, elbow-length sleeves.
Caption for 5.jpg: A women's deep magenta, flowy tunic dress with a V-neck, short sleeves, and elegant draped detailing.
Searching for specific clothes
Preparing out dataset
First, you need to generate caption and keywords for every image. Then, you will use embeddings, which will be used later to compare the images in the search dataset with other descriptions and images.
You have used Gemini API’s JS SDK to tag and caption images of clothing. Using embedding models, you were able to search a database of images for clothing matching our description, or similar to the provided clothing item.