Category: All posts
Apr 12, 2024
Introducing Timescale Vector, PostgreSQL++ for production AI applications. Timescale Vector enhances pgvector with faster search, higher recall, and more efficient time-based filtering, making PostgreSQL your new go-to vector database. Timescale Vector is available today in early access on Timescale’s cloud data platform. Keep reading to learn why and how we built it. Then take it out for a ride: try Timescale Vector for free today, with a 90-day extended trial.
AI is eating the world. The rise of large language models (LLMs) like GPT-4, Llama 2, and Claude 2 is driving explosive growth in AI applications. Every business is urgently exploring how to use LLM’s new, game-changing capabilities to better serve their customers—either by building new applications with AI or adding AI capabilities to their existing products.
Vector data is at the heart of this Cambrian explosion of AI applications. In the field of LLMs, vector embeddings are mathematical representations of phrases that capture their semantic meaning as a vector of numerical values. Embeddings give you new superpowers for a key computing application: search.
Embeddings enable search based on semantics (i.e., finding items in your database that are closest in meaning to the query) even if the words used are very different. They differ from lexicographical search, where you search for the use of similar words. (We covered vector embeddings in more detail in a recent blog post on finding nearest neighbors.)
In addition to traditional semantic search applications, vector embeddings are crucial for making LLMs work with data the model wasn’t pre-trained on, like private information (e.g., company policies), or pre-trained on in-depth (e.g., your product documentation and capabilities), along with new information that’s emerged in the meantime (e.g., news, chat history).
They are also at the core of generative AI techniques like Retrieval Augmented Generation (RAG) to find relevant information to pass as context to an LLM. Other applications include everything from knowledge base search to classification and giving long-term memory to LLMs and AI agents.
To power next-generation AI systems, developers need to efficiently store and query vectors. There are a myriad of vector databases in the market, with new ones popping up seemingly every week. This leaves developers facing a paradox of choice. Do they adopt new, niche databases built specifically for vector data? Or do they use familiar, general-purpose databases, like PostgreSQL, extended with vector support?
At Timescale, we have seen this dynamic before in other markets, namely time-series data. Despite the existence of niche time-series databases, like InfluxDB and AWS Timestream, millions of developers chose to use a general-purpose database in PostgreSQL with TimescaleDB, an extension for time series.
PostgreSQL's robustness, familiarity, and ecosystem outweighed switching to a completely new database. Additionally, data does not live in a silo, and the ability to join data of different modalities efficiently is often crucial for enabling interesting applications.
With vector data, developers face a similar choice. And while developer needs for LLM applications are still being molded, we think PostgreSQL will come out on top and become the foundational database for complex, production-grade AI applications, just as it has been in applications over the past decades.
Niche databases for vector data like Pinecone, Weaviate, Qdrant, and Zilliz benefited from the explosion of interest in AI applications. They come purpose-built for storing and querying vector data at scale—with unique features like indexes for Approximate Nearest Neighbor (ANN) search and hybrid search. But as developers started using them for their AI applications, the significant downsides of building with these databases became clear:
In the words of one developer we interviewed:
"Postgres is more production-ready, more configurable, and more transparent in its operation than almost any other vector store." - Software Engineer at LegalTech startup
PostgreSQL is the most loved database in the world, according to the Stack Overflow 2023 Developer Survey. And for a good reason: it’s been battle-hardened by production use for over three decades, it’s robust and reliable, and it has a rich ecosystem of tools, drivers, and connectors.
And amidst the sea of new, niche vector databases, there is an undeniable appetite for using PostgreSQL as a vector database—look no further than the numerous tutorials, integrations, and tweets about pgvector, the open-source PostgreSQL extension for vector data.
One of the many great features of PostgreSQL is that it is designed to be extensible. These “PostgreSQL extensions” add extra functionality without slowing down or adding complexity to core development and maintenance. It’s what we leveraged for building TimescaleDB and how pgvector came about as well.
While pgvector is a wonderful extension (and is offered as part of Timescale Vector), it is just one piece of the puzzle in providing a production-grade experience for AI application developers on PostgreSQL. After speaking with numerous developers at nimble startups and established industry giants, we saw the need to enhance pgvector to cater to the performance and operational needs of developers building AI applications.
Today, we launch Timescale Vector to enable you, the developer, to build production AI applications at scale with PostgreSQL.
# Create client object
TIME_PARTITION_INTERVAL = timedelta(days=7)
vec = client.Async(TIMESCALE_SERVICE_URL,
TABLE_NAME,
EMBEDDING_DIMENSIONS,
time_partition_interval=TIME_PARTITION_INTERVAL)
# create table
await vec.create_tables()
# similarity search with time-based filtering
records_time_filtered = await vec.search(query_embedding,
limit=3,
uuid_time_filter=client.UUIDTimeRange(start_date, end_date))
In addition to unique capabilities for handling vector data at scale, Timescale Vector sits atop Timescale’s production-grade cloud PostgreSQL platform, complete with:
Since we opened up a waitlist for Timescale Vector, we have spoken to numerous developers at companies large and small about their AI applications and use of vector data. We want to publicly thank each and every one of them for informing our roadmap and helping shape the initial product direction.
Here’s what they had to say about Timescale Vector:
Timescale Vector is available today in early access on Timescale, the PostgreSQL cloud platform, for new and existing customers. The easiest way to access Timescale Vector is via the Timescale Vector Python client library, which offers a simple way to integrate PostgreSQL and Timescale Vector into your AI applications.
To get started, create a new database on Timescale, download the .env
file with your database credentials, and run:
pip install timescale-vector
Then, see the Timescale Vector docs for instructions or learn the key features of Timescale Vector by following this tutorial.
Try Timescale Vector for free today on Timescale.
Why early access❓Our goal is to help developers power production AI applications, and we have a high bar for the quality of such a critical piece of infrastructure as a database. So, during this early access period, we’re inviting developers to test and give feedback on Timescale Vector so that they can shape the future of the product as we continue to improve, building up to a general availability release.
This is the first of several exciting announcements as part of Timescale AI Week. You can find something new to help you build AI applications every day at timescale.com/ai. Read on to learn more about why we built Timescale Vector, our new DiskANN-inspired index, and how it performs against alternatives.
As part of Timescale Vector, we are excited to introduce a new indexing technique to PostgreSQL that improves vector search speed and accuracy.
ANN search on vector data is a bustling field of research. Fresh methodologies and algorithms emerge annually. For a glimpse into this evolution, one can peruse the proceedings of NeurIPS 2022.
It is exciting to note that PostgreSQL-based vector stores, in particular, are at the forefront of this evolution. We previously discussed how to use pgvector’s IVFFlat index type to speed up search queries for ANN queries, but it has performance limitations at the scale of hundreds of thousands of vectors. Since then, teams across the PostgreSQL community are implementing more index types based on the latest research.
For instance, pgvector just released support for HNSW (a graph-based index type for vector data). Neon’s pg_embedding was also recently released with support for HNSW.
And as part of Timescale Vector, we’re announcing support for a new graph-based index, tsv
drawing inspiration from Microsoft’s DiskANN. Its graph-construction approach differs from HNSW, allowing for different trade-offs and unique advantages.
You can create a timescale-vector index on the column containing your vector embeddings using our Python client as follows:
# Create a timescale vector (DiskANN) search index on the embedding column
await vec.create_embedding_index(client.TimescaleVectorIndex())
And using SQL to create the timescale-vector index looks like:
CREATE INDEX ON <table name> USING tsv(<vector_column_name>);
We'll delve deeper into why the algorithms used by this index are particularly advantageous in PostgreSQL. We are staunch proponents of offering diverse options to the PostgreSQL community, leading us to design an index distinct from those already available in the ecosystem.
Our decision to house our index in a standalone extension from pgvector was influenced by our use of Rust (for context, pgvector is written in C). Yet, we aimed to simplify its adoption by avoiding introducing a new vector data type. Hence, our index works with the vector data type provided by pgvector. This allows users to easily experiment with different index types by creating different indexes on the same table contents.
The timescale-vector
DiskANN-inspired index offers several compelling benefits:
Much of the research on graph-based ANN algorithms focuses on graphs designed for in-memory use. In contrast, DiskANN is crafted with SSD optimization in mind. Its unique on-disk layout of graph nodes clusters each node vector with its neighboring links.
This ensures that each node's visit during a search preloads the data required for the subsequent step, aligning seamlessly with Postgres' page caching system to minimize SSD seeks. The absence of graph layers (unlike HNSW) further augments the cache's efficiency, ensuring that only the most frequently accessed nodes are retained in memory.
DiskANN gives users the option to quantize vectors within the index. This process reduces the vector size, consequently shrinking the index size significantly and expediting searches.
While this might entail a marginal decline in query accuracy, PostgreSQL already stores the full-scale vectors in the heap-table
. This allows for correcting the diminished accuracy from the indexed data using heap data, refining the search results. For instance, when searching for k
items, the index can be prompted for 2k
items, which can then be re-ranked using heap data to yield the closest k
results.
In our benchmarking, we saw a 10x index size reduction by enabling product quantization, reducing the index size from 7.92 GB to just 790 MB.
To use the timescale-vector
index with product quantization (PQ) enabled, you can create the index as follows:
CREATE INDEX ON <table name> USING tsv(<column name>) WITH (use_pq=true);
Frequently referred to as hybrid search, this feature enables the identification of the k-nearest neighbors to a specific query vector while adhering to certain criteria on another column. For instance, vectors could be classified based on their source (such as internal documentation, external references, blog entries, and forums). A query from an external client might exclude internal documents, whereas others might limit their search to only blog entries and forums.
Timescale Vector already supports hybrid search, but the current version does not optimize the search in all cases. The exciting thing is that the DiskANN authors have outlined a blueprint for a filtered DiskANN, and we plan to add this optimization before Timescale Vector reaches GA.
Performing hybrid search using the Timescale Vector Python client can be done as follows:
records_filtered = await vec.search(query_embedding, filter={"author": "Walter Isaacson"})
In sum, the introduction of the new timescale-vector index type inspired by DiskANN benefits developers in the following ways:
Now for some numbers. Let’s look at how Timescale Vector’s approximate nearest neighbor search performance compares to a specialized vector database, in this case, Weaviate, and existing PostgreSQL search index algorithms.
We compared the approximate nearest neighbor search performance of the following index types:
We tracked a variety of metrics provided by the ANN benchmarking suite, but the ones we’ll cover in-depth are the following:
Note: pgvector is packaged as part of Timescale Vector, so developers have the flexibility to choose the right index type, whether it’s DiskANN, HNSW, or IVFFlat—or opt to use exact KNN search for their use case.
We used the popular set of tools from ANN Benchmarks to benchmark the performance of the different algorithms against each other on the same dataset. We do experiments using two modes: a parallel (--batch) mode, where multiple queries are executed at once, and a single-threaded mode, where queries are executed one at a time.
We benchmarked the performance of the various algorithms on a dataset of one million OpenAI vector embeddings. Each vector has 1,536 dimensions and was created using OpenAI’s text-embedding-ada-002 embedding model. (Fun fact: the dataset is based on embedding content from Wikipedia articles). Shout-out to Kumar Shivendu for making that dataset easily available and adding it to ANN benchmarks.
We used the following setup:
We varied the following parameters over different runs to test their impact on the performance vs. accuracy trade-off. We tried to use parameters already present in ANN Benchmarks (often suggested by the index creators themselves) whenever possible.
Timescale Vector (DiskANN):
num_neighbors
varied between 50 and 64, which sets the maximum number of neighbors per node. Defaults to 50. Higher values increase accuracy but make the graph traversal slower.search_list_size
was varied between 10 and 100: this is the S parameter used in the greedy search algorithm used during construction. Defaults to 100. Higher values improve graph quality at the cost of slower index builds.max_alpha
varied between 1.0 and 1.2: max_alpha
is the alpha parameter. Defaults to 1.0. Higher values improve graph quality at the cost of slower index builds.query_search_list_size
varied between 10 and 250: this is the number of additional candidates considered during the graph search at query time. Defaults to 100. Higher values improve query accuracy while making the query slower.num_clusters
to 256. num_clusters
sets the number of clusters (and centroids) used for every segment’s quantizer. Higher values improve accuracy at the cost of slower index builds and queries.If you want to dive deeper into the timescale vector indexing algorithm, its parameters, and how it works, we discuss it in more depth in the “Under the Hood” section.
pgvector HNSW:
m
varied between 12 and 36: m
represents the maximum number of connections per layer. Think of these connections as edges created for each node during graph construction. Increasing m
increases accuracy but also increases index build time and size.ef_construction
varied between 40 and 128: ef_construction
represents the size of the dynamic candidate list for constructing the graph. It influences the trade-off between index quality and construction speed. Increasing ef_construction
enables more accurate search results at the expense of lengthier index build times.ef_search
varied between 10 and 600: ef_search
represents the size of the dynamic candidate list for search. Increasing ef_search
increases accuracy at the expense of speed. Weaviate HNSW:
maxConnections
varied between 8 and 72. This parameter is explained under m
in the pgvector section above.efConstruction
varied between 64 and 512. This parameter is explained by ef_construction
in the pgvector section above.ef
varied between 16 and 768. This parameter is explained by ef_search
in the pgvector section above.pg_embedding HNSW:
dims
was set to 1,536, the number of dimensions in our data.m
varied between 12 and 36.efconstruction
varied between 40 and 128.efsearch
varied between 10 and 600.pgvector IVFFlat:
lists
varied from 100 to 4000: lists
represents the number of clusters created during index building (It’s called lists because each centroid has a list of vectors in its region). Increasing lists
reduces the number of vectors in each list and results in smaller regions, but it can introduce more errors as some points are excluded.probes
varied from 1 to 100: probes
represents the number of regions to consider during a query. Increasing probes
means more regions can be searched and improves accuracy at the expense of query speed.In general, we want to evaluate the accuracy versus speed trade-off between the various approximate nearest neighbor algorithms.
Single-threaded experiment
First, let’s take a look at the results of the single-threaded mode experiment, where queries are executed one at a time.
The single-threaded experiment shows that Timescale Vector’s new index type inspired by DiskANN comes out on top in terms of query performance at 99% accuracy. It outperforms Weaviate, a specialized vector database, by 122.05%. It also outperforms all existing PostgreSQL ANN indexes, beating pgvector’s HNSW algorithm by 29.24%, pg_embedding by 257.56%, and pgvector’s IVFFLAT by 1,383.26%.
Note that if you’re willing to lower accuracy to less than 99%, say ~95%, you can get much faster query throughput. For example, at 96% accuracy, Timescale Vector’s DiskANN-inspired index can process 425 queries per second, and pgvector HNSW can process 376 queries per second.
We achieved the above results with the following parameters:
Next, let’s look at the results from the parallel (--batch) mode experiments, where multiple queries are executed at once.
Multi-threaded experiment
It is important to ensure that search algorithms scale with the number of CPUs, so let’s look at the results comparing the queries per second of the different vector search indexes when they are run in parallel:
The multi-threaded experiment shows that Timescale Vector’s new index again comes out on top for queries per second at 99% accuracy:
Apart from Weaviate, we didn’t test other specialized databases, but we can get an idea of how this result extends to other specialized vector databases like Qdrant by doing a loose comparison to Qdrant benchmarks, which were also performed using ANN benchmarks on the same dataset of one million OpenAI embeddings, and used the same machine spec (8 CPU, 32 GB RAM) and experiment method (batch queries).
With 1,252.20 queries per second at 99% recall, Timescale Vector surpasses Qdrant’s 354 queries per second by more than 250%. In the future, we plan to test Timescale Vector against more specialized vector databases and benchmark the results.
While not depicted on the graph above, Timescale Vector with PQ enabled reaches 752 queries per second at 99% recall, giving it 106.61% better results than Weaviate, and 177.94% better results than pg_embedding. In terms of overall performance, it would rank 3rd in our benchmark behind Timescale Vector without PQ enabled and pgvector HNSW. These results are all the more impressive considering the index size reduction that enabling PQ yields, as discussed below.
Another interesting result is that pgvector’s HNSW implementation is 232% more performant as Neon's pg_embedding’s HNSW implementation and 146% more performant as Weaviate’s HNSW implementation when running queries in parallel. Finally, the graph-based methods (DiskANN or HNSW) all outperform the inverted file method (IVFFlat) by a wide margin.
The good news is—because Timescale Vector includes all of pgvector’s indexes (and other features)—you can create and test both Timescale Vector’s DiskANN and pgvector’s HNSW indexes on your data. Flexibility for the win!
We achieved the above results for the multi-threaded experiment with the following parameters:
Let’s look at the index size for the respective runs that yielded the best performance in the multi-threaded experiment:
Timescale Vector without PQ comes in at 7.9 GB, as does pgvector HNSW, pg_embedding HNSW and pg_vector IVFFlat. However, by enabling PQ, Timescale Vector decreases its space usage for the index by 10x, leading to an index size of just 790 MB.
Note that we uncovered a bug in ANN Benchmarks which incorrectly reported the Weaviate index size in a previous version of this post. ANN Benchmarks reports the index size as psutil.Process().memory_info().rss / 1024
by default and that setting is not overridden for Weaviate. We've not reflected Weaviate on the graph above for that reason.
Now, we compare the build times for the above results:
pgvector’s IVFFlat index comes out on top for index build time thanks to its simpler construction. Looking at the most performant indexes in terms of query performance, Timescale Vector is 75.41% faster to build than pgvector’s HNSW in this case. And Timescale Vector with PQ is 59.91% faster to build than pgvector HNSW.
Many developers we’ve spoken to are willing to trade off index build time with higher query speed, as it's an upfront time cost (since the index is only built once) rather than a recurring time cost and could potentially affect their users’ experience. Moreover, the Timescale Vector team is working on decreasing the index build time by using parallelism in upcoming releases.
If you’re looking for a fun weekend project, we encourage you to replicate these with these parameters to verify the results for yourself using the ANN benchmarks suite.
Graph-based indexes usually work like this: each vector in a database becomes a vertex in a graph, which is linked to other vertices. When a query arises, the graph is greedily traversed from a starting point, leading us to the nearest neighboring node to the queried vector.
The example above shows that following the path with arrows from the graph entry point leads to the nearest neighbor.
Let's break down how the timescale-vector index works, focusing on the search and construction algorithms. We'll begin with the search, as it plays a pivotal role in the construction process.
The objective of the greedy search is to pinpoint the nodes closest to a specific query vector. This algorithm operates in a loop until it reaches a fixed point. Within each loop iteration, the following happens:
When searching for k-nearest neighbors, you set 'S' as 'K' plus an added buffer B. The larger this buffer, the greater the number of candidates required for loop completion, rendering the search process slower but ensuring higher accuracy. At query time, you set the buffer size using the tsv.query_search_list_size
GUC.
The construction phase commences with a singular starting vertex, chosen arbitrarily. The graph is built iteratively, adding one node at a time. For each added node, the following happens:
This is a brief overview of the mechanism of the DiskANN approach. In a future blog post, we’ll dive into the background of how graph indexes work and how the DiskANN approach differs from other algorithms in the space.
A key use case that Timescale Vector uniquely enables is efficient time-based vector search. In many AI applications, time is an important metadata component for vector embeddings.
Documents, content, images, and other sources of embeddings have a time associated with them, and that time is commonly used as a filter for increasing the relevance of embeddings in an ANN search. Time-based vector search enables the retrieval of vectors that are not just similar but also pertinent to a specific time frame, enriching the quality and applicability of search results.
Time-based vector functionality is helpful for AI applications in the following ways:
Yet, traditionally, searching by two components’ “similarity” and “time” is challenging for approximate nearest neighbor indexes and makes the similarity-search index less effective, as similarity search alone can often return results not in the time window of interest.
One approach to solving this is partitioning the data by time and creating ANN indexes on each partition individually. Then, during the search, you can do the following:
To solve this problem, Timescale Vector leverages TimescaleDB’s hypertables, which automatically partition vectors and associated metadata by a timestamp. This enables efficient querying on vectors by both similarity to a query vector and time, as partitions not in the time window of the query are ignored, making the search a lot more efficient by filtering out whole swaths of data in one go.
Hear from Web Begole, CTO at MarketReader, who aims to use Timescale Vector’s time-based search functionality to help find news stories related to stock market movements more efficiently:
“Being able to utilize conditions on vectors for similarity, alongside traditional time and value conditions, simplifies our data pipelines and allows us to lean on the strengths of PostgreSQL for searching large datasets very quickly. Timescale Vector allows us to efficiently search our system for news related to assets, minimizing our reliance on tagging by our sources.”
Here’s an example of using semantic search with time filters using the Timescale Vector Python client:
# define search query
query_string = "What's new with PostgreSQL 16?"
query_embedding = get_embeddings(query_string)
# Time filter variables for query
start_date = datetime(2023, 8, 1, 22, 10, 35) # Start date = 1 August 2023, 22:10:35
end_date = datetime(2023, 8, 30, 22, 10, 35) # End date = 30 August 2023, 22:10:35
# Similarity search with time filter
records_time_filtered = await vec.search(query_embedding,
limit=3,
uuid_time_filter=client.UUIDTimeRange(start_date, end_date))
Timescale Vector is available today in early access on Timescale, the PostgreSQL++ cloud platform, for new and existing customers.
Try Timescale Vector for free today on Timescale.
And a reminder: