Voice memos

This notebook provides a quick example of how to work with audio and text files in the same prompt. You’ll use the Gemini API to help you generate ideas for your next blog post, based on voice memos you recorded on your phone, and previous articles you’ve written.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env

Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");

GEMINI_API_KEY is set in the environment variables

Note

In our particular case the .env is is one directory up from the notebook, hence we need to use ../ to go up one directory. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── examples
    └── Voice_memos.ipynb

Initialize SDK Client

With the new SDK, now you only need to initialize a client with you API key (or OAuth if using Vertex AI). The model is now set in each call.

const google = require("@google/genai") as typeof import("@google/genai");

const ai = new google.GoogleGenAI({ apiKey: GEMINI_API_KEY });

Select a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Upload your audio and text files

const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");

const downloadFile = async (url: string, filePath: string) => {
  const response = await fetch(url);
  if (!response.ok) {
    throw new Error(`Failed to download file: ${response.statusText}`);
  }
  fs.mkdirSync(path.dirname(filePath), { recursive: true });
  const buffer = await response.blob();
  const bufferData = Buffer.from(await buffer.arrayBuffer());
  fs.writeFileSync(filePath, bufferData);
};

const audioFilePath = path.join("../assets/examples/Voice_memos", "Walking_thoughts_3.m4a");
const pdfFilePath1 = path.join("../assets/examples/Voice_memos", "A_Possible_Future_for_Online_Content.pdf");
const pdfFilePath2 = path.join("../assets/examples/Voice_memos", "Unanswered_Questions_and_Endless_Possibilities.pdf");

const audioFileUrl = "https://storage.googleapis.com/generativeai-downloads/data/Walking_thoughts_3.m4a";
const pdfFileUrl1 =
  "https://storage.googleapis.com/generativeai-downloads/data/A_Possible_Future_for_Online_Content.pdf";
const pdfFileUrl2 =
  "https://storage.googleapis.com/generativeai-downloads/data/Unanswered_Questions_and_Endless_Possibilities.pdf";

await downloadFile(audioFileUrl, audioFilePath);
await downloadFile(pdfFileUrl1, pdfFilePath1);
await downloadFile(pdfFileUrl2, pdfFilePath2);

const audioFile = await ai.files.upload({
  file: audioFilePath,
  config: {
    displayName: "Walking thoughts 3",
    mimeType: "audio/m4a",
  },
});
console.log("Audio file uploaded:", audioFile.createTime);

Audio file uploaded: 2025-07-06T19:40:31.408868Z

Extract text from the PDFs

const pdf_parse = require("pdf-parse") as typeof import("pdf-parse");

const aPossibleFuturePdf = (await pdf_parse(fs.readFileSync(pdfFilePath1))).text;
const unansweredQuestionsPdf = (await pdf_parse(fs.readFileSync(pdfFilePath2))).text;

const textFilePath1 = path.join("../assets/examples/Voice_memos", "A_Possible_Future_for_Online_Content.txt");
const textFilePath2 = path.join("../assets/examples/Voice_memos", "Unanswered_Questions_and_Endless_Possibilities.txt");

fs.writeFileSync(textFilePath1, aPossibleFuturePdf);
fs.writeFileSync(textFilePath2, unansweredQuestionsPdf);

const textFile1 = await ai.files.upload({
  file: textFilePath1,
  config: {
    displayName: "A Possible Future for Online Content",
    mimeType: "text/plain",
  },
});
console.log("Text file 1 uploaded:", textFile1.createTime);

const textFile2 = await ai.files.upload({
  file: textFilePath2,
  config: {
    displayName: "Unanswered Questions and Endless Possibilities",
    mimeType: "text/plain",
  },
});
console.log("Text file 2 uploaded:", textFile2.createTime);

Text file 1 uploaded: 2025-07-06T19:43:44.482321Z
Text file 2 uploaded: 2025-07-06T19:43:45.829594Z

System instructions

Write a detailed system instruction to configure the model.

const SYSTEM_INSTRUCTION = `
  Objective: Transform raw thoughts and ideas into polished, engaging blog posts that capture a writers unique style and voice.
  Input:
  Example Blog Posts (1-5): A user will provide examples of blog posts that resonate with their desired style and tone. These will guide you in understanding the preferences for word choice, sentence structure, and overall voice.
  Audio Clips: A user will share a selection of brainstorming thoughts and key points through audio recordings. They will talk freely and openly, as if they were explaining their ideas to a friend.
  Output:
  Blog Post Draft: A well-structured first draft of the blog post, suitable for platforms like Substack or LinkedIn.
  The draft will include:
  Clear and engaging writing: you will strive to make the writing clear, concise, and interesting for the target audience.
  Tone and style alignment: The language and style will closely match the examples provided, ensuring consistency with the desired voice.
  Logical flow and structure: The draft will be organized with clear sections based on the content of the post.
  Target word count: Aim for 500-800 words, but this can be adjusted based on user preferences.
  Process:
  Style Analysis: Carefully analyze the example blog posts provided by the user to identify key elements of their preferred style, including:
  Vocabulary and word choice: Formal vs. informal, technical terms, slang, etc.
  Sentence structure and length: Short and impactful vs. longer and descriptive sentences.
  Tone and voice: Humorous, serious, informative, persuasive, etc.
  Audio Transcription and Comprehension: Your audio clips will be transcribed with high accuracy. you will analyze them to extract key ideas, arguments, and supporting points.
  Draft Generation: Using the insights from the audio and the style guidelines from the examples, you will generate a first draft of the blog post. This draft will include all relevant sections with supporting arguments or evidence, and a great ending that ties everything together and makes the reader want to invest in future readings.
`;

Generate Content

const response = await ai.models.generateContent({
  model: MODEL_ID,
  contents: [
    "Draft my next blog post based on my thoughts in this audio file and these two previous blog posts I wrote.",
    google.createPartFromUri(audioFile.uri ?? "", audioFile.mimeType ?? "audio/m4a"),
    google.createPartFromUri(textFile1.uri ?? "", textFile1.mimeType ?? "text/plain"),
    google.createPartFromUri(textFile2.uri ?? "", textFile2.mimeType ?? "text/plain"),
  ],
  config: {
    systemInstruction: SYSTEM_INSTRUCTION,
  },
});
tslab.display.markdown(response.text ?? "");

Here’s a draft of your next blog post, incorporating your thoughts from the audio and aligning with the style and tone of your previous posts:

The Unexpected Value of “Throwaway Work”: Why Writing to Think (and Doing to Learn) is Your Superpower

Early in my career, I spent countless hours crafting detailed visions, elaborate roadmaps, and innovative ideas. I meticulously planned projects, outlining every step and anticipated outcome. There was a certain expectation, a lingering shadow from my academic days, that every effort would culminate in a final, perfect product. You’re given an assignment, you do it, and then you’re graded on it – no take-backs, no discards.

The reality of the professional world hit differently. Many of those carefully constructed plans never saw the light of day. Projects were cancelled late on, visions pivoted entirely, and entire roadmaps ended up in the digital bin. I remember vividly the frustration, the feeling that I was throwing away large chunks of my effort. It felt like a monumental, colossal waste of time. And honestly, it still happens today, not just to me, but to many teams navigating dynamic environments where priorities shift, and markets evolve at breakneck speed.

For a long time, I struggled to reconcile this reality with my ingrained notions of productivity and success. Was I just bad at planning? Was the work itself flawed? It took a significant shift in perspective, largely influenced by the culture of a new team I joined, to truly appreciate the profound value in what I once considered “throwaway work.” This is where the concept of “writing to think” (and by extension, “doing to learn”) transformed my understanding of creation and growth.

Beyond the “Final Product” Mindset

The revelation wasn’t simply that “priorities change” – a common, albeit true, refrain in fast-paced industries. It was about recognizing that the act of producing, whether it’s a written proposal, a prototype, or a detailed plan, is inherently valuable, regardless of its ultimate fate. It’s not just about the output; it’s about the process.

This process is where your skills are honed. It’s where you solidify your understanding, clarify your thoughts, and uncover new insights. Each draft, each discarded idea, each project that doesn’t go anywhere, is a crucial step in making you better at what you do. It’s a fundamental part of learning and growth, especially when it comes to the ideation and strategic thinking side of things.

Embracing the “Right to Think”

The culture of “writing to think” encourages a different approach:

Write More, Write Earlier, Write Often: Don’t wait for perfection. Get your ideas down, even if they’re half-baked. The volume of early creation accelerates the refinement process.
Write Without the Intention of Finality: The goal isn’t necessarily a polished deliverable from day one. It’s about using writing (or building, or designing) as a tool for exploration and discovery.
Be Willing to Scrap and Move On: Once a piece of work has served its purpose – whether that’s clarifying a concept, testing a hypothesis, or simply moving your thinking forward – be prepared to discard it. Its value isn’t tied to its existence as a final product, but to the learning it facilitated.

This approach aligns perfectly with iterative processes and the philosophy of “doing by learning.” It’s a powerful framing for how we engage with creative and strategic work. It’s about leveraging every effort as a learning opportunity, rather than seeing only the fully deployed project as a success.

Why There’s No Such Thing as “Throwaway Work”

In this model, the very idea of “throwaway work” ceases to exist. Every line of text, every brainstormed idea, every initial sketch, contributes to your development. It’s all part of the continuous loop of honing your skills and getting better over time.

This reframe is incredibly empowering. It liberates us from the pressure of perfection on every single output and allows for greater experimentation and agility. It acknowledges that the journey of creation, even when fraught with pivots and discards, is precisely what builds expertise and resilience.

In a world increasingly shaped by rapid technological advancements, especially with the rise of Generative AI, this mindset becomes even more critical. The ability to iterate quickly, to produce and refine ideas without the burden of perceived failure, and to learn from every attempt—even those that don’t make it to launch—is paramount. It fosters the fluid and flexible content ecosystem I often discuss, allowing us to capture nascent thoughts and rapidly transform them, knowing that each step, even if it leads to a dead end, pushes our capabilities forward.

So, the next time you find yourself scrapping a project or archiving a well-thought-out plan, remember: it wasn’t a waste of time. It was an investment in your growth, a step in honing your unique skills, and an essential part of your journey to becoming better at what you do.

Learning more

Learn more about the File API with the quickstart.