Gemini API: Chat with SQL using LangChain

Reading an SQL database can be challenging for humans. However, with accurate prompts, Gemini models can generate answers based on the data. Through the use of the Gemini API, you will be able retrieve necessary information by chatting with a SQL database.

Setup

Install the Google GenAI SDK

Install the Google GenAI SDK from npm.

$ npm install @google/genai

Setup your API key

You can create your API key using Google AI Studio with a single click.

Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.

Here’s how to set it up in a .env file:

$ touch .env
$ echo "GEMINI_API_KEY=<YOUR_API_KEY>" >> .env
Tip

Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:

$ export GEMINI_API_KEY="<YOUR_API_KEY>"

Load the API key

To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.

$ npm install dotenv

Then, we can load the API key in our code:

const dotenv = require("dotenv") as typeof import("dotenv");

dotenv.config({
  path: "../../.env",
});

const GEMINI_API_KEY = process.env.GEMINI_API_KEY ?? "";
if (!GEMINI_API_KEY) {
  throw new Error("GEMINI_API_KEY is not set in the environment variables");
}
console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note

In our particular case the .env is is two directories up from the notebook, hence we need to use ../../ to go up two directories. If the .env file is in the same directory as the notebook, you can omit it altogether.

│
├── .env
└── examples
    └── langchain
        └── Chat_with_SQL_using_langchain.ipynb

Select a model

Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).

const tslab = require("tslab") as typeof import("tslab");

const MODEL_ID = "gemini-2.5-flash-preview-05-20";

Setting up the database

To query a database, you first need to set one up.

  1. Load the California Housing Dataset: Load the dataset from scikit and extract it into a DataFrame.
const danfojs = require("danfojs-node") as typeof import("danfojs-node");

const DATASET_URL =
  "https://raw.githubusercontent.com/javascriptdata/scikit.js/refs/heads/main/docs/static/data/california_housing.csv";

console.log("Loading California housing dataset from:", DATASET_URL);

const california_housing = await danfojs.readCSV(DATASET_URL);
california_housing.head().print();
Loading California housing dataset from: https://raw.githubusercontent.com/javascriptdata/scikit.js/refs/heads/main/docs/static/data/california_housing.csv
╔════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╤═══════════════════╗
║            │ MedInc            │ HouseAge          │ AveRooms          │ AveBedrms         │ Population        │ AveOccup          │ Latitude          │ Longitude         │ MedHouseVal       ║
╟────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────╢
║ 0          │ 8.3252            │ 41                │ 6.9841269841269…  │ 1.0238095238095…  │ 322               │ 2.5555555555555…  │ 37.88             │ -122.23           │ 4.526             ║
╟────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────╢
║ 1          │ 8.3014            │ 21                │ 6.2381370826010…  │ 0.9718804920913…  │ 2401              │ 2.1098418277680…  │ 37.86             │ -122.22           │ 3.585             ║
╟────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────╢
║ 2          │ 7.2574            │ 52                │ 8.2881355932203…  │ 1.0734463276836…  │ 496               │ 2.8022598870056…  │ 37.85             │ -122.24           │ 3.521             ║
╟────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────╢
║ 3          │ 5.6431            │ 52                │ 5.8173515981735…  │ 1.0730593607305…  │ 558               │ 2.5479452054794…  │ 37.85             │ -122.25           │ 3.413             ║
╟────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────┼───────────────────╢
║ 4          │ 3.8462            │ 52                │ 6.2818532818532…  │ 1.0810810810810…  │ 565               │ 2.1814671814671…  │ 37.85             │ -122.25           │ 3.422             ║
╚════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╧═══════════════════╝
  1. Connect to the SQLite database: The database will be stored in the specified file.
const fs = require("fs") as typeof import("fs");
const path = require("path") as typeof import("path");
// eslint-disable-next-line @typescript-eslint/no-unsafe-call, @typescript-eslint/no-unsafe-member-access
const sqlite3 = require("sqlite3").verbose() as typeof import("sqlite3");

const SQLITE_TABLE_QUERY = `
CREATE TABLE IF NOT EXISTS california_housing (
  MedInc REAL,
  HouseAge INTEGER,
  AveRooms REAL,
  AveBedrms REAL,
  Population INTEGER,
  AveOccup REAL,
  Latitude REAL,
  Longitude REAL,
  MedHouseVal REAL
);
`;
const SQLITE_INSERT_QUERY = `INSERT INTO california_housing (
  MedInc, HouseAge, AveRooms, AveBedrms, Population, AveOccup, Latitude, Longitude, MedHouseVal
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?);`;
const SQLITE_FILE = "../../assets/langchain/california_housing.db";

type SQLiteDatabaseType = InstanceType<typeof sqlite3.Database>;

interface InsertOptions {
  batchSize?: number;
  useTransaction?: boolean;
  logProgress?: boolean;
}

class SQLiteHandler {
  private dbFile: string;
  private db: SQLiteDatabaseType;
  private isConnected = false;

  constructor(dbFile: string) {
    this.dbFile = dbFile;

    const dir = path.dirname(dbFile);
    if (!fs.existsSync(dir)) {
      fs.mkdirSync(dir, { recursive: true });
    }

    this.db = new sqlite3.Database(this.dbFile, (err) => {
      if (err) {
        console.error("Error opening database:", err.message);
        this.isConnected = false;
      } else {
        console.log(`Connected to ${this.dbFile}`);
        this.isConnected = true;
      }
    });
  }

  public isReady(): boolean {
    return this.isConnected;
  }

  public waitForConnection(): Promise<void> {
    return new Promise((resolve, reject) => {
      if (this.isConnected) {
        resolve();
        return;
      }

      const checkConnection = () => {
        if (this.isConnected) {
          resolve();
        } else {
          setTimeout(checkConnection, 100);
        }
      };

      checkConnection();
    });
  }

  public async createTable(query: string): Promise<void> {
    await this.waitForConnection();

    return new Promise((resolve, reject) => {
      this.db.run(query, (err) => {
        if (err) {
          console.error("Error creating table:", err.message);
          reject(err);
        } else {
          console.log("Table created successfully");
          resolve();
        }
      });
    });
  }

  public async insertRows(insertQuery: string, rows: unknown[][], options: InsertOptions = {}): Promise<void> {
    const { batchSize = 1000, useTransaction = true, logProgress = true } = options;

    await this.waitForConnection();

    if (rows.length === 0) {
      console.log("No rows to insert");
      return;
    }

    if (batchSize >= rows.length || !useTransaction) {
      await this.insertBatch(insertQuery, rows, useTransaction, logProgress);
    } else {
      await this.insertRowsBatched(insertQuery, rows, batchSize, logProgress);
    }
  }

  private async insertBatch(
    insertQuery: string,
    batch: unknown[][],
    useTransaction = true,
    logProgress = true
  ): Promise<void> {
    return new Promise((resolve, reject) => {
      const startTime = Date.now();

      if (logProgress) {
        console.log(`Inserting ${batch.length} rows...`);
      }

      const executeInserts = () => {
        const stmt = this.db.prepare(insertQuery, (err) => {
          if (err) {
            console.error("Error preparing statement:", err.message);
            if (useTransaction) this.db.run("ROLLBACK");
            reject(err);
            return;
          }
        });

        let completed = 0;
        let hasError = false;

        batch.forEach((row, index) => {
          if (hasError) return;

          stmt.run(row, (err) => {
            if (hasError) return;

            if (err) {
              hasError = true;
              console.error(`Error inserting row ${index}:`, err.message);
              stmt.finalize();
              if (useTransaction) this.db.run("ROLLBACK");
              reject(err);
              return;
            }

            completed++;

            if (
              logProgress &&
              (completed % Math.max(1, Math.floor(batch.length / 10)) === 0 || completed % 1000 === 0)
            ) {
              console.log(
                `Progress: ${completed}/${batch.length} rows (${Math.round((completed / batch.length) * 100)}%)`
              );
            }

            if (completed === batch.length) {
              stmt.finalize((finalizeErr) => {
                // eslint-disable-next-line @typescript-eslint/no-unnecessary-condition
                if (finalizeErr) {
                  console.error("Error finalizing statement:", finalizeErr.message);
                  if (useTransaction) this.db.run("ROLLBACK");
                  reject(finalizeErr);
                  return;
                }

                const finishTransaction = () => {
                  const duration = Date.now() - startTime;
                  if (logProgress) {
                    console.log(`Successfully inserted ${batch.length} rows in ${duration}ms`);
                  }
                  resolve();
                };

                if (useTransaction) {
                  this.db.run("COMMIT", (commitErr) => {
                    if (commitErr) {
                      console.error("Error committing transaction:", commitErr.message);
                      reject(commitErr);
                      return;
                    }
                    finishTransaction();
                  });
                } else {
                  finishTransaction();
                }
              });
            }
          });
        });
      };

      if (useTransaction) {
        this.db.serialize(() => {
          this.db.run("BEGIN TRANSACTION", (err) => {
            if (err) {
              console.error("Error starting transaction:", err.message);
              reject(err);
              return;
            }
            executeInserts();
          });
        });
      } else {
        executeInserts();
      }
    });
  }

  private async insertRowsBatched(
    insertQuery: string,
    rows: unknown[][],
    batchSize: number,
    logProgress: boolean
  ): Promise<void> {
    const totalBatches = Math.ceil(rows.length / batchSize);

    if (logProgress) {
      console.log(`Processing ${rows.length} rows in ${totalBatches} batches of ${batchSize}`);
    }

    for (let i = 0; i < rows.length; i += batchSize) {
      const batch = rows.slice(i, i + batchSize);
      const batchNumber = Math.floor(i / batchSize) + 1;

      if (logProgress) {
        console.log(`Processing batch ${batchNumber}/${totalBatches}...`);
      }

      await this.insertBatch(insertQuery, batch, true, false);

      if (logProgress) {
        console.log(`Completed batch ${batchNumber}/${totalBatches}`);
      }
    }

    if (logProgress) {
      console.log(`All ${rows.length} rows inserted successfully`);
    }
  }

  public async query(sql: string, params: unknown[] = []): Promise<unknown[]> {
    await this.waitForConnection();

    return new Promise((resolve, reject) => {
      this.db.all(sql, params, (err, rows) => {
        if (err) {
          console.error("Error executing query:", err.message);
          reject(err);
        } else {
          resolve(rows);
        }
      });
    });
  }

  public async run(sql: string, params: unknown[] = []): Promise<{ lastID: number; changes: number }> {
    await this.waitForConnection();

    return new Promise((resolve, reject) => {
      this.db.run(sql, params, function (err) {
        if (err) {
          console.error("Error executing statement:", err.message);
          reject(err);
        } else {
          resolve({ lastID: this.lastID, changes: this.changes });
        }
      });
    });
  }

  public async getTableInfo(table: string): Promise<unknown[]> {
    return this.query(`PRAGMA table_info(${table})`);
  }

  public async getTables(): Promise<string[]> {
    const result = await this.query(`
      SELECT name FROM sqlite_master 
      WHERE type='table' AND name NOT LIKE 'sqlite_%'
    `);
    return result.map((row) => row.name as string);
  }

  public async getRowCount(table: string): Promise<number> {
    const result = await this.query(`SELECT COUNT(*) as count FROM ${table}`);
    return result[0].count as number;
  }

  public async tableExists(table: string): Promise<boolean> {
    const result = await this.query(
      `
      SELECT name FROM sqlite_master 
      WHERE type='table' AND name=?
    `,
      [table]
    );
    return result.length > 0;
  }

  public async close(): Promise<void> {
    if (!this.isConnected) {
      console.log("Database already closed");
      return;
    }

    await new Promise((resolve, reject) => {
      this.db.close((err) => {
        if (err) {
          console.error("Error closing database:", err.message);
          reject(err);
        } else {
          console.log("Database connection closed.");
          this.isConnected = false;
          resolve();
        }
      });
    });
  }
}
const handler = new SQLiteHandler(SQLITE_FILE);

await handler.createTable(SQLITE_TABLE_QUERY);
await handler.insertRows(SQLITE_INSERT_QUERY, california_housing.values as unknown[][], {
  batchSize: 2000,
  useTransaction: true,
  logProgress: true,
});
Connected to ../../assets/langchain/california_housing.db
Table created successfully
Processing 20640 rows in 11 batches of 2000
Processing batch 1/11...
Completed batch 1/11
Processing batch 2/11...
Completed batch 2/11
Processing batch 3/11...
Completed batch 3/11
Processing batch 4/11...
Completed batch 4/11
Processing batch 5/11...
Completed batch 5/11
Processing batch 6/11...
Completed batch 6/11
Processing batch 7/11...
Completed batch 7/11
Processing batch 8/11...
Completed batch 8/11
Processing batch 9/11...
Completed batch 9/11
Processing batch 10/11...
Completed batch 10/11
Processing batch 11/11...
Completed batch 11/11
All 20640 rows inserted successfully

Question to query

With the database connection established, the SQLiteHandler object now contains information about our database, which the model can access.

You can now start asking the LLM to generate queries.

console.log("Table Info:", await handler.getTableInfo("california_housing"));
Table Info: [
  {
    cid: 0,
    name: 'MedInc',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 1,
    name: 'HouseAge',
    type: 'INTEGER',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 2,
    name: 'AveRooms',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 3,
    name: 'AveBedrms',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 4,
    name: 'Population',
    type: 'INTEGER',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 5,
    name: 'AveOccup',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 6,
    name: 'Latitude',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 7,
    name: 'Longitude',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  },
  {
    cid: 8,
    name: 'MedHouseVal',
    type: 'REAL',
    notnull: 0,
    dflt_value: null,
    pk: 0
  }
]
const googleGenerativeAI = require("@langchain/google-genai") as typeof import("@langchain/google-genai");
const sqlDatabase = require("langchain/sql_db") as typeof import("langchain/sql_db");
const sqlChain = require("langchain/chains/sql_db") as typeof import("langchain/chains/sql_db");
const typeorm = require("typeorm") as typeof import("typeorm");

async function createLangChainSQLChain(dbPath: string) {
  const llm = new googleGenerativeAI.ChatGoogleGenerativeAI({
    model: MODEL_ID,
    temperature: 0,
    apiKey: GEMINI_API_KEY,
  });

  const datasource = new typeorm.DataSource({
    type: "sqlite",
    database: dbPath,
    synchronize: false,
  });

  await datasource.initialize();

  const db = await sqlDatabase.SqlDatabase.fromDataSourceParams({
    appDataSource: datasource,
    includesTables: ["california_housing"],
  });

  const writeQueryChain = await sqlChain.createSqlQueryChain({
    llm,
    db,
    dialect: "sqlite",
  });

  return {
    chain: writeQueryChain,
    db,
    datasource,
    llm,
  };
}

You use createSqlQueryChain that fits our database. It provides default prompts for various types of SQL including Oracle, Google SQL, MySQL and more.

In this case, default prompt is suitable for the task. However, feel free to experiment with writing this part of our chain yourself to suit your preferences.

const { chain, datasource, db, llm } = await createLangChainSQLChain(SQLITE_FILE);
tslab.display.markdown(chain.steps[2].lc_kwargs.template as string);

You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (“) to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.

Use the following format:

Question: “Question here” SQLQuery: “SQL Query to run” SQLResult: “Result of the SQLQuery” Answer: “Final answer here”

Only use the following tables: {table_info}

Question: {input}

const q1 = await chain.invoke({ question: "What is the total population?" });
tslab.display.markdown(q1);
SELECT sum("Population") FROM california_housing
function parseSQLQuery(query: string): string {
  const match = /```sqlite\n(.*?)\n```/s.exec(query);
  return match ? match[1].trim() : "";
}

const q1_result = await db.run(parseSQLQuery(q1));
console.log("SQL Query Result:", q1_result);
SQL Query Result: [{"sum(\"Population\")":29421840}]

Great! The SQL query is correct, but it needs proper formatting before it can be executed directly by the database.

Validating the query

You will pass the output of the previous query to a model that will extract just the SQL query and ensure its validity.

import { PromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { RunnableSequence, RunnablePassthrough } from "@langchain/core/runnables";

const validation_prompt = new PromptTemplate({
  inputVariables: ["not_formatted_query"],
  template: `You are going to receive a text that contains a SQL query. Extract that query. Make sure that it is a valid SQL command that can be passed directly to the Database. Avoid using Markdown for this task.

Text: {not_formatted_query}`,
});

const validate_chain = RunnableSequence.from([
  chain,
  {
    not_formatted_query: (input: unknown) => input,
  },
  validation_prompt,
  llm,
  new StringOutputParser(),
]);
const validation_result = await validate_chain.invoke({ question: "What is the total population?" });
tslab.display.markdown(validation_result);

SELECT sum(“Population”) FROM california_housing

Automatic querying

Now, let’s automate the process of querying the database using QuerySQLDataBaseTool. This tool can receive text from previous parts of the chain, execute the query, and return the answer.

import { QuerySqlTool } from "langchain/tools/sql";
import { RunnableSequence } from "@langchain/core/runnables";

const execute_query = new QuerySqlTool(db);
const execute_chain: RunnableSequence<Record<string, unknown>, unknown> = RunnableSequence.from([
  validate_chain,
  execute_query,
]);
const execute_result = await execute_chain.invoke({ question: "What is the total population?" });
tslab.display.markdown(execute_result);

[{“sum("Population")”:29421840}]

Generating answer

You are almost done!

To enhance our output, you’ll use LLM not only to get the number but to get properly formatted and natural language response.

const answer_prompt = PromptTemplate.fromTemplate(`
    You are going to receive a original user question, generated SQL query, and result of said query. You should use this information to answer the original question. Use only information provided to you.

    Original Question: {question}
    SQL Query: {query}
    SQL Result: {result}
    Answer: 
`);

const answer_chain = RunnableSequence.from([
  RunnablePassthrough.assign({ query: validate_chain }).assign({
    result: RunnableSequence.from([(input: unknown) => input.query as string, execute_query]),
  }),
  answer_prompt,
  llm,
  new StringOutputParser(),
]);
const answer_result = await answer_chain.invoke({ question: "What is the total population?" });
tslab.display.markdown(answer_result);

The total population is 29421840.

Next steps

Congratulations! You’ve successfully created a functional chain to interact with SQL. Now, feel free to explore further by asking different questions.