Category: All posts
Dec 04, 2024
Vector databases enable efficient storage and search of vector data. They are essential to developing and maintaining AI applications using Large Language Models (LLMs).
With some help from the pgvector extension, you can leverage PostgreSQL as a vector database to store and query OpenAI embeddings. OpenAI embeddings are a type of data representation (in the shape of vectors, i.e., lists of numbers) used to measure the similarity of text strings for OpenAI’s models.
In this article, we work through the example of creating a chatbot to answer questions about Timescale. The chatbot will be trained on content from the Timescale Developer Q&A blog posts. This example will illustrate the key concepts for creating, storing, and querying OpenAI embeddings with PostgreSQL and pgvector.
This example has three parts:
One could think of this as a “hello world” tutorial for building a chatbot that can reference a company knowledge base or developer docs.
Foundational models of AI (e.g., GPT-3 or GPT-4) may be missing some information needed to give a good answer to certain specific questions. That’s because relevant information was not in the dataset used to train the model. (For example, the information is stored in private documents or only became available recently.) This lack of data may make these models unsuitable as a chatbot for specific information banks.
Retrieval-augmented generation (RAG) gives a simple solution: provide additional context to the foundational model in the prompt. For example, if the model has no data on cronuts and you ask, “What is a cronut?” you may get an improper answer. In this case, you can transform the prompt by adding context: “A cronut resembles a doughnut and is made from croissant-like dough filled with flavored cream and fried in grapeseed oil. What is a cronut?”
The foundational model can then use its knowledge of donuts and croissants to wax eloquently about cronuts. This technique is powerful—it allows you to “teach” foundational models about things only you know about and use that to create a ChatGPT++ experience for your users!
But what context do you provide to the model? If you have a library of information, how do you know what’s relevant to a given question? That is what embeddings are for. OpenAI embeddings are a mathematical representation of the semantic meaning of a piece of text that allows for similarity search.
With this representation, if you get a user question and calculate its embedding, you can use a similarity search against data embeddings in your library to find the most relevant information. But that requires having an embedding representation of your library.
This post is a guide to creating, storing, and querying OpenAI vector embeddings using pgvector, the extension that turns PostgreSQL into a vector database.
Pgvector is an open-source extension for PostgreSQL that enables storing and searching over machine learning-generated embeddings. It provides different capabilities that let users identify exact and approximate nearest neighbors. Pgvector is designed to work seamlessly with other PostgreSQL features, including indexing and querying.
Now we’re ready to start building our chatbot!
pip install -r requirements.txt
Import all the packages we will be using:
import openai
import os
import pandas as pd
import numpy as np
import json
import tiktoken
import psycopg2
import ast
import pgvector
import math
from psycopg2.extras import execute_values
from pgvector.psycopg2 import register_vector
You’ll need to sign up for an OpenAI Developer Account and create an OpenAI API Key – we recommend getting a paid account to avoid rate limiting and settting a spending cap so that you avoid any surprises with bills.
Once you have an OpenAI API key, it’s a best practice to store it as an environment variable and then have your Python program read it.
#First, run export OPENAI_API_KEY=sk-YOUR_OPENAI_API_KEY...
# Get openAI api key by reading local .env file
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai.api_key = os.environ['OPENAI_API_KEY']
Embeddings measure how related text strings are. First, we'll create embeddings using the OpenAI API on some text we want the LLM to answer questions on.
In this example, we'll use content from the Timescale blog, specifically from the Developer Q&A section, which features posts by Timescale users talking about their real-world use cases.
You can replace this blog data with any text you want to embed, such as your own company blog, developer documentation, internal knowledge base, or any other information you’d like to have a “ChatGPT-like” experience over.
# Load your CSV file into a pandas DataFrame
df = pd.read_csv('blog_posts_data.csv')
df.head()
The output looks like this:
Title | Content | URL | |
0 | How to Build a Weather Station With Elixir, Ne... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/how-to-build-a-... |
1 | CloudQuery on Using PostgreSQL for Cloud Asset... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/cloudquery-on-u... |
2 | How a Data Scientist Is Building a Time-Series... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/how-a-data-scie... |
3 | How Conserv Safeguards History: Building an En... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/how-conserv-saf... |
4 | How Messari Uses Data to Open the Cryptoeconom... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/how-messari-use... |
It's usually a good idea to calculate how much creating embeddings for your selected content will cost. We provide a number of helper functions to calculate a cost estimate before creating the embeddings to help us avoid surprises.
For OpenAI, you are charged on a per-token basis for embeddings created. The total cost for the blog posts we want to embed will be less than $0.01, thanks to OpenAI’s small text embedding model, text-embedding-3-small. This model boasts not only stronger performances but also 5X cost reduction compared to its predecessor, text-embedding-ada-002.
For OpenAI, you are charged on a per-token basis for embeddings created. The total cost will be less than $0.01 for the blog posts we want to embed, thanks to OpenAI’s recent announcement of a 75 % cost reduction in their most popular embedding model, text-embedding-ada-002.
What is a token? Tokens are common sequences of characters found in text. Roughly speaking, a token is three-quarters (¾) of a word. Large language models, like GPT-3 and GPT-4 made by OpenAI, are trained to understand the statistical relationships between tokens and predict the next token in a sequence. Learn more about tokens with OpenAI’s Tokenizer tool.
# Helper functions to help us create the embeddings
# Helper func: calculate number of tokens
def num_tokens_from_string(string: str, encoding_name = "cl100k_base") -> int:
if not string:
return 0
# Returns the number of tokens in a text string
encoding = tiktoken.get_encoding(encoding_name)
num_tokens = len(encoding.encode(string))
return num_tokens
# Helper function: calculate length of essay
def get_essay_length(essay):
word_list = essay.split()
num_words = len(word_list)
return num_words
# Helper function: calculate cost of embedding num_tokens
# Assumes we're using the text-embedding-ada-002 model
# See https://openai.com/pricing
def get_embedding_cost(num_tokens):
return num_tokens/1000*0.0002
# Helper function: calculate total cost of embedding all content in the dataframe
def get_total_embeddings_cost():
total_tokens = 0
for i in range(len(df.index)):
text = df['content'][i]
token_len = num_tokens_from_string(text)
total_tokens = total_tokens + token_len
total_cost = get_embedding_cost(total_tokens)
return total_cost
# quick check on total token amount for price estimation
total_cost = get_total_embeddings_cost()
print("estimated price to embed this content = $" + str(total_cost))
The OpenAI API has a limit to the maximum number of tokens it can create an embedding for in a single request: 8,191 to be specific.
To get around this limit, we'll break up our text into smaller chunks. Generally, it's a best practice to “chunk” the documents you want to create embeddings into groups of a fixed token size.
The precise number of tokens to include in a chunk depends on your use case and your model’s context window—the number of input tokens it can handle in a prompt.
For our purposes, we'll aim for chunks of around 512 tokens each. Chunking text up is a complex topic worthy of its own blog post. We’ll illustrate a simple method we found to work well below. If you want to read about other approaches, we recommend this section of the LangChain docs.
Note: If you prefer to skip this step, you can use the provided file: blog_data_and_embeddings.csv, which contains the data and embeddings that you'll generate in this step.
The code below creates a new list of our blog content while retaining the metadata associated with the text, such as the blog title and URL that the text is associated with.
# Create new list with small content chunks to not hit max token limits
# Note: the maximum number of tokens for a single request is 8191
# https://platform.openai.com/docs/guides/embeddings/embedding-models
# list for chunked content and embeddings
new_list = []
# Split up the text into token sizes of around 512 tokens
for i in range(len(df.index)):
text = df['content'][i]
token_len = num_tokens_from_string(text)
if token_len <= 512:
new_list.append([df['title'][i], df['content'][i], df['url'][i], token_len])
else:
# add content to the new list in chunks
start = 0
ideal_token_size = 512
# 1 token ~ 3/4 of a word
ideal_size = int(ideal_token_size // (4/3))
end = ideal_size
#split text by spaces into words
words = text.split()
#remove empty spaces
words = [x for x in words if x != ' ']
total_words = len(words)
#calculate iterations
chunks = total_words // ideal_size
if total_words % ideal_size != 0:
chunks += 1
new_content = []
for j in range(chunks):
if end > total_words:
end = total_words
new_content = words[start:end]
new_content_string = ' '.join(new_content)
new_content_token_len = num_tokens_from_string(new_content_string)
if new_content_token_len > 0:
new_list.append([df['title'][i], new_content_string, df['url'][i], new_content_token_len])
start += ideal_size
end += ideal_size
Now that our text is chunked better, we can create embeddings for each chunk of text using the OpenAI API.
We’ll use this helper function to create embeddings for a piece of text:
openai_client = openai.OpenAI()
# Helper function: get embeddings for a text
def get_embeddings(text):
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input = text.replace("\n"," ")
)
return response.data[0].embedding
And then create embeddings for each chunk of content:
# Create embeddings for each piece of content
for i in range(len(new_list)):
text = new_list[i][1]
embedding = get_embeddings(text)
new_list[i].append(embedding)
# Create a new dataframe from the list
df_new = pd.DataFrame(new_list, columns=['title', 'content', 'url', 'tokens', 'embeddings'])
df_new.head()
The new data frame should look like this:
Title | Content | URL | Tokens | Embeddings | |
---|---|---|---|---|---|
0 | How to Build a Weather Station With Elixir, Ne... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/how-to-build-a-... | 501 | [0.021440856158733368, 0.02200360782444477, -0... |
1 | How to Build a Weather Station With Elixir, Ne... | capture weather and environmental data. In all... | https://www.timescale.com/blog/how-to-build-a-... | 512 | [0.016165969893336296, 0.011341351084411144, 0... |
2 | How to Build a Weather Station With Elixir, Ne... | command in their database migration:SELECT cre... | https://www.timescale.com/blog/how-to-build-a-... | 374 | [0.022517921403050423, -0.0019158280920237303,... |
3 | CloudQuery on Using PostgreSQL for Cloud Asset... | This is an installment of our “Community Membe... | https://www.timescale.com/blog/cloudquery-on-u... | 519 | [0.009028822183609009, -0.005185891408473253, ... |
4 | CloudQuery on Using PostgreSQL for Cloud Asset... | Architecture with CloudQuery SDK- Writing plug... | https://www.timescale.com/blog/cloudquery-on-u... | 511 | [0.02050386555492878, 0.010169642977416515, 0.... |
As an optional but recommended step, you can save the original blog content along with associated embeddings in a CSV file for reference later on so that you don't have to recreate embeddings if you want to reference it in another project.
# Save the dataframe with embeddings as a CSV file
df_new.to_csv('blog_data_and_embeddings.csv', index=False)
In the section above, we showed how to manually create and manage embeddings in your own data pipeline – chunking content, calling the OpenAI API, and storing the results. While this approach helps you understand the fundamentals, in production, you may want to automate this process completely. Let’s look at how pgai Vectorizer can handle this entire pipeline for you!
Managing embeddings in production involves several challenges: keeping embeddings in sync with changing content, handling API failures, and optimally chunking text.
pgai Vectorizer automates this entire process directly in PostgreSQL - similar to how PostgreSQL automatically maintains indexes for your tables.
The setup process differs depending on whether you’re using Timescale Cloud or hosting PostgreSQL yourself.
On Timescale Cloud
-- 1. Store your OpenAI API key securely in Timescale Cloud
-- 2. Navigate to Project Settings > AI Model API Keys in the Timescale Console
-- 3. The key is stored securely and not in your database
-- 4. Create the extensions
CREATE EXTENSION IF NOT EXISTS ai;
For self-hosted PostgreSQL
export OPENAI_API_KEY="your-api-key-here"
# Start the vectorizer worker
vectorizer-worker --connection="postgres://user:password@host:port/dbname"
Instead of manually creating embeddings using Python, you can define a vectorizer that automatically generates and maintains embeddings for your content:
SELECT ai.create_vectorizer(
'blog_posts'::regclass,
destination => 'blog_embeddings',
embedding => ai.embedding_openai('text-embedding-3-small', 768),
chunking => ai.chunking_recursive_character_text_splitter('content'),
-- Pro tip: Add blog title as context to each chunk
formatting => ai.formatting_python_template('$title: $chunk')
);
This single SQL command:
You can then search your content the same way as before:
SELECT
chunk,
embedding <=> ai.openai_embed('text-embedding-3-small', 'How is Timescale used in IoT?') as distance
FROM blog_embeddings
ORDER BY distance
LIMIT 3;
Vectorizer runs automatically every five minutes on Timescale Cloud, handling retries and keeping your embeddings up to date. For more details on setup and advanced features like monitoring the Vectorizer, see our pgai Vectorizer documentation.
The accuracy and cost of your RAG application depends heavily on implementation choices such as the embedding model selection to chunking strategies.
Here are more blog posts to help you build effective RAG applications with PostgreSQL:
Now that we have created embedding vectors for our blog content, the next step is to store the embedding vectors in a vector database to help us perform a fast search over many vectors.
A vector database is a database that can handle vector data. Vector databases are useful for:
We’ll use PostgreSQL with the pgvector extension installed as our vector database. Pgvector extends PostgreSQL to handle vector data types and vector similarity search, like nearest neighbor search, which we’ll use to find the k
most related embeddings in our database for a given user prompt.
Here are five reasons why PostgreSQL is a good choice for storing and handling vector data:
First, we’ll create a PostgreSQL database. You can create a cloud PostgreSQL database in minutes for free on Timescale or use a local PostgreSQL database for this step.
Once you’ve created your PostgreSQL database, export your connection string as an environment variable, and just like the OpenAI API key, we’ll read it into our Python program from the environment file:
# Timescale database connection string
# Found under "Service URL" of the credential cheat-sheet or "Connection Info" in the Timescale console
# In terminal, run: export TIMESCALE_CONNECTION_STRING=postgres://<fill in here>
connection_string = os.environ['TIMESCALE_CONNECTION_STRING']
We then connect to our database using the popular psycopg2 python library and install the pgvector and pgvectorscale extension (which provides powerful filtering and indexing capabilities ) as follows:
# Connect to PostgreSQL database in Timescale using connection string
conn = psycopg2.connect(connection_string)
cur = conn.cursor()
#install pgvector
cur.execute("CREATE EXTENSION IF NOT EXISTS vector;")
conn.commit()
#install pgvectorscale
cur.execute("CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;")
conn.commit()
Once we’ve installed pgvector, we use the register_vector() command to register the vector type with our connection:
# Register the vector type with psycopg2
register_vector(conn)
Once we’ve connected to the database, let’s create a table that we’ll use to store embeddings along with metadata. Our table will look as follows:
id | title | url | content | tokens | embedding |
Id
represents the unique ID of each vector embedding in the table.title
is the blog title from which the content associated with the embedding is taken.url
is the blog URL from which the content associated with the embedding is taken.content
is the actual blog content associated with the embedding.tokens
is the number of tokens the embedding represents.embedding
is the vector representation of the content.One advantage of using PostgreSQL as a vector database is that you can easily store metadata and embedding vectors in the same database, which is helpful for supplying the user-relevant information related to the response they receive, like links to read more or specific parts of a blog post that are relevant to them.
# Create table to store embeddings and metadata
table_create_command = """
CREATE TABLE embeddings (
id bigserial primary key,
title text,
url text,
content text,
tokens integer,
embedding vector(1536)
);
"""
cur.execute(table_create_command)
cur.close()
conn.commit()
Now that we’ve created the database and created the table to house the embeddings and metadata, the final step is to insert the embedding vectors into the database.
For this step, it’s a best practice to batch insert the embeddings rather than insert them one by one.
#Batch insert embeddings and metadata from dataframe into PostgreSQL database
register_vector(conn)
cur = conn.cursor()
# Prepare the list of tuples to insert
data_list = [(row['title'], row['url'], row['content'], int(row['tokens']), np.array(row['embeddings'])) for index, row in df_new.iterrows()]
# Use execute_values to perform batch insertion
execute_values(cur, "INSERT INTO embeddings (title, url, content, tokens, embedding) VALUES %s", data_list)
# Commit after we insert all embeddings
conn.commit()
Let’s sanity check by running some simple queries against our newly inserted data:
cur.execute("SELECT COUNT(*) as cnt FROM embeddings;")
num_records = cur.fetchone()[0]
print("Number of vector records in table: ", num_records,"\n")
# Correct output should be 129
# print the first record in the table, for sanity-checking
cur.execute("SELECT * FROM embeddings LIMIT 1;")
records = cur.fetchall()
print("First record in table: ", records)
In this example, we only have 129 embedding vectors, so searching through all of them is blazingly fast. But for larger datasets, you need to create indexes to speed up searching for similar embeddings, so we include the code to build the index for illustrative purposes.
While pgvector supports the IVFFLAT and HNSW index types for approximate nearest neighbor (ANN) search, pgvectorscale offers a more cost-efficient and powerful index type for pgvector data: StreamingDiskANN, which we use here.
You always want to build this index after you have inserted the data, as the index needs to discover clusters in your data to be effective, and it does this only when first building the index.
The StreamingDiskANN index has tunable parameters depending on your goal, whether it is changing indexing operations or querying operations. In our case, we use the default values of the parameters. You can read more about tuning here.
# Create an index on the data for faster retrieval
cur.execute('CREATE INDEX embedding_idx ON embeddings USING diskann (embedding);')
conn.commit()
Given a user question, we’ll perform the following steps to use information stored in the vector database to answer their question using Retrieval Augmented Generation:
k
nearest neighbors to the question embedding from our embedding vectors representing the blog content. In our example, we’ll use k=3, finding the three most similar embedding vectors and associated content.First, we’ll define a sample question that a user might want to answer about the blog posts stored in the database.
# Question about Timescale we want the model to answer
input = "How is Timescale used in IoT?"
Since Timescale is popular for IoT sensor data, a user might want to learn specifics about how they can leverage it for that use case.
Here’s the function we use to find the three nearest neighbors to the user question. Note it uses pgvector’s <=>
operator, which finds the Cosine distance (also known as Cosine similarity) between two embedding vectors.
# Helper function: Get top 3 most similar documents from the database
def get_top3_similar_docs(query_embedding, conn):
embedding_array = np.array(query_embedding)
# Register pgvector extension
register_vector(conn)
cur = conn.cursor()
# Get the top 3 most similar documents using the KNN <=> operator
cur.execute("SELECT content FROM embeddings ORDER BY embedding <=> %s LIMIT 3", (embedding_array,))
top3_docs = cur.fetchall()
return top3_docs
We define a helper function to get a completion response from an OpenAI model while we use the previously defined helper function, get_embeddings
, to create an embedding for the user question. We use GPT-4o, but you can use any other model from OpenAI.
We also specify a number of parameters, such as limits of the maximum number of tokens in the model response and model temperature, which controls the randomness of the model, which you can modify to your liking:
# Helper function: get text completion from OpenAI API
# Note we're using the latest gpt-3.5-turbo-0613 model
def get_completion_from_messages(messages, model="gpt-4o", temperature=0, max_tokens=1000):
response = openai_client.chat.completions.create(
model=model,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
)
return response.choices[0].message.content
We’ll define a function to process the user input by retrieving the most similar documents from our database and passing the user input, along with the relevant retrieved context to the OpenAI model to provide a completion response to.
Note that we modify the system prompt as well in order to influence the tone of the model’s response.
We pass to the model the content associated with the three most similar embeddings to the user input using the assistant role. You can also append the additional context to the user message.
# Function to process input with retrieval of most similar documents from the database
def process_input_with_retrieval(user_input):
delimiter = "```"
#Step 1: Get documents related to the user input from database
related_docs = get_top3_similar_docs(get_embeddings(user_input), conn)
# Step 2: Get completion from OpenAI API
# Set system message to help set appropriate tone and context for model
system_message = f"""
You are a friendly chatbot. \
You can answer questions about timescaledb, its features and its use cases. \
You respond in a concise, technically credible tone. \
"""
# Prepare messages to pass to model
# We use a delimiter to help the model understand the where the user_input starts and ends
messages = [
{"role": "system", "content": system_message},
{"role": "user", "content": f"{delimiter}{user_input}{delimiter}"},
{"role": "assistant", "content": f"Relevant Timescale case studies information: \n {related_docs[0][0]} \n {related_docs[1][0]} {related_docs[2][0]}"}
]
final_response = get_completion_from_messages(messages)
return final_response
Let’s see an example of the model’s output to our original input question:
input = "How is Timescale used in IoT?"
response = process_input_with_retrieval(input)
print(input)
print(response)
Here’s the model’s response:
User input: How is Timescale used in IoT?
Model Response:
TimescaleDB is widely used in IoT applications due to its ability to efficiently handle time-series data, which is a common requirement in IoT systems. Here are some ways TimescaleDB is utilized in IoT:
1. **Data Ingestion and Storage**: IoT devices generate large volumes of time-stamped data. TimescaleDB's architecture is optimized for ingesting and storing this data efficiently, allowing for high write throughput and efficient storage through features like data compression.
2. **Real-time Analytics**: TimescaleDB supports real-time analytics, enabling IoT applications to process and analyze data as it arrives. This is crucial for applications that require immediate insights, such as monitoring systems or predictive maintenance.
3. **Historical Data Analysis**: With its support for continuous aggregates, TimescaleDB allows for efficient querying of historical data. This is useful for trend analysis, reporting, and long-term data storage, which are common in IoT use cases.
4. **Integration with IoT Platforms**: TimescaleDB can be integrated with various IoT platforms and tools, such as AWS IoT, MQTT, and Grafana, to manage, visualize, and analyze IoT data seamlessly.
5. **Scalability**: As IoT deployments grow, the volume of data can increase significantly. TimescaleDB's scalability features, such as partitioning and distributed hypertables, help manage this growth effectively.
Overall, TimescaleDB provides a robust solution for managing the unique challenges of IoT data, combining the benefits of a relational database with time-series capabilities.
We can also ask the model questions about specific documents in the database, in this case about specific Timescale users who have spoken about their experience in a blog post:
input_2 = "Tell me about Edeva and Hopara. How do they use Timescale?"
response_2 = process_input_with_retrieval(input_2)
print(input_2)
print(response_2)
Here’s the model output for the input question above; notice how it uses specific details from the blog posts.
User input: Tell me about Edeva and Hopara. How do they use Timescale?
Model Response:
### Edeva
**Use Case:**
Edeva leverages TimescaleDB to manage and analyze data from their smart city systems, particularly focusing on their Actibump dynamic speed bump. The data collected includes:
1. **IoT Event Data:** Information such as vehicle speeds and environmental conditions.
2. **Alarm Data:** Alerts for sensor malfunctions or other issues.
3. **Status Data:** Self-checks and statistical data from IoT devices.
4. **Administrative Data:** Metadata about devices, such as configuration details.
**Key Features Utilized:**
- **Continuous Aggregations:** To speed up queries and make dashboards responsive.
- **Percentile Aggregations:** For calculating accurate percentile values without querying raw data.
- **SQL Compatibility:** Simplifies onboarding for developers familiar with SQL.
**Benefits:**
- **Performance:** Transitioned from sluggish to lightning-fast dashboards.
- **Ease of Use:** Developers could quickly adapt due to SQL familiarity.
- **Scalability:** Efficiently handles large datasets, such as hundreds of millions of records.
### Hopara
**Use Case:**
Hopara uses TimescaleDB to manage and visualize time-series data for their geospatial analytics platform. The platform integrates various data sources to provide insights into spatial and temporal trends.
**Key Features Utilized:**
- **Time-Series Data Management:** Efficiently stores and queries large volumes of time-series data.
- **Geospatial Capabilities:** Leverages PostgreSQL’s PostGIS extension for spatial queries.
- **Continuous Aggregations:** To pre-compute and speed up complex queries.
**Benefits:**
- **Scalability:** Handles large datasets with ease.
- **Performance:** Fast query execution for real-time analytics.
- **Integration:** Seamless integration with existing PostgreSQL tools and extensions.
Both Edeva and Hopara benefit from TimescaleDB’s ability to handle large volumes of time-series data efficiently, providing fast query performance and ease of use through SQL compatibility.
Retrieval Augmented Generation (RAG) is a powerful method of building applications with LLMs that enables you to teach foundation models about things they were not originally trained on, like private documents or recently published information.
This project is an example of how to create, store, and perform similarity search on OpenAI embeddings. We used PostgreSQL + pgvector + pgvectorscale as our vector database to efficiently store and query the embeddings, enabling precise and relevant responses.
And if you’re looking for a production PostgreSQL database for your vector workloads, try Timescale. It’s free for 30 days, no credit card required.
Here are more blog posts about RAG with PostgreSQL and different tools: