Reading an SQL database can be challenging for humans. However, with accurate prompts, Gemini models can generate answers based on the data. Through the use of the Gemini API, you will be able retrieve necessary information by chatting with a SQL database.
You can create your API key using Google AI Studio with a single click.
Remember to treat your API key like a password. Don’t accidentally save it in a notebook or source file you later commit to GitHub. In this notebook we will be storing the API key in a .env file. You can also set it as an environment variable or use a secret manager.
Another option is to set the API key as an environment variable. You can do this in your terminal with the following command:
$ export GEMINI_API_KEY="<YOUR_API_KEY>"
Load the API key
To load the API key from the .env file, we will use the dotenv package. This package loads environment variables from a .env file into process.env.
$ npm install dotenv
Then, we can load the API key in our code:
const dotenv =require("dotenv") astypeofimport("dotenv");dotenv.config({ path:"../../.env",});const GEMINI_API_KEY =process.env.GEMINI_API_KEY??"";if (!GEMINI_API_KEY) {thrownewError("GEMINI_API_KEY is not set in the environment variables");}console.log("GEMINI_API_KEY is set in the environment variables");
GEMINI_API_KEY is set in the environment variables
Note
In our particular case the .env is is two directories up from the notebook, hence we need to use ../../ to go up two directories. If the .env file is in the same directory as the notebook, you can omit it altogether.
Now select the model you want to use in this guide, either by selecting one in the list or writing it down. Keep in mind that some models, like the 2.5 ones are thinking models and thus take slightly more time to respond (cf. thinking notebook for more details and in particular learn how to switch the thiking off).
You use createSqlQueryChain that fits our database. It provides default prompts for various types of SQL including Oracle, Google SQL, MySQL and more.
In this case, default prompt is suitable for the task. However, feel free to experiment with writing this part of our chain yourself to suit your preferences.
You are a SQLite expert. Given an input question, first create a syntactically correct SQLite query to run, then look at the results of the query and return the answer to the input question. Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per SQLite. You can order the results to return the most informative data in the database. Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in double quotes (“) to denote them as delimited identifiers. Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: “Question here” SQLQuery: “SQL Query to run” SQLResult: “Result of the SQLQuery” Answer: “Final answer here”
Only use the following tables: {table_info}
Question: {input}
const q1 =await chain.invoke({ question:"What is the total population?" });tslab.display.markdown(q1);
SELECT sum("Population") FROM california_housing
functionparseSQLQuery(query:string):string {const match =/```sqlite\n(.*?)\n```/s.exec(query);return match ? match[1].trim() :"";}const q1_result =await db.run(parseSQLQuery(q1));console.log("SQL Query Result:", q1_result);
Great! The SQL query is correct, but it needs proper formatting before it can be executed directly by the database.
Validating the query
You will pass the output of the previous query to a model that will extract just the SQL query and ensure its validity.
import { PromptTemplate } from"@langchain/core/prompts";import { StringOutputParser } from"@langchain/core/output_parsers";import { RunnableSequence, RunnablePassthrough } from"@langchain/core/runnables";const validation_prompt =newPromptTemplate({ inputVariables: ["not_formatted_query"], template:`You are going to receive a text that contains a SQL query. Extract that query. Make sure that it is a valid SQL command that can be passed directly to the Database. Avoid using Markdown for this task.Text: {not_formatted_query}`,});const validate_chain = RunnableSequence.from([ chain, { not_formatted_query: (input:unknown) => input, }, validation_prompt, llm,newStringOutputParser(),]);const validation_result =await validate_chain.invoke({ question:"What is the total population?" });tslab.display.markdown(validation_result);
SELECT sum(“Population”) FROM california_housing
Automatic querying
Now, let’s automate the process of querying the database using QuerySQLDataBaseTool. This tool can receive text from previous parts of the chain, execute the query, and return the answer.
import { QuerySqlTool } from"langchain/tools/sql";import { RunnableSequence } from"@langchain/core/runnables";const execute_query =newQuerySqlTool(db);const execute_chain: RunnableSequence<Record<string,unknown>,unknown>= RunnableSequence.from([ validate_chain, execute_query,]);const execute_result =await execute_chain.invoke({ question:"What is the total population?" });tslab.display.markdown(execute_result);
[{“sum("Population")”:29421840}]
Generating answer
You are almost done!
To enhance our output, you’ll use LLM not only to get the number but to get properly formatted and natural language response.
const answer_prompt = PromptTemplate.fromTemplate(` You are going to receive a original user question, generated SQL query, and result of said query. You should use this information to answer the original question. Use only information provided to you. Original Question: {question} SQL Query: {query} SQL Result: {result} Answer: `);const answer_chain = RunnableSequence.from([ RunnablePassthrough.assign({ query: validate_chain }).assign({ result: RunnableSequence.from([(input:unknown) => input.queryasstring, execute_query]), }), answer_prompt, llm,newStringOutputParser(),]);const answer_result =await answer_chain.invoke({ question:"What is the total population?" });tslab.display.markdown(answer_result);
The total population is 29421840.
Next steps
Congratulations! You’ve successfully created a functional chain to interact with SQL. Now, feel free to explore further by asking different questions.