Category: All posts
Jun 27, 2024
Introducing pgvectorscale, a new open-source extension that makes PostgreSQL an even better database for AI applications. Pgvectorscale builds upon pgvector to unlock large-scale, high-performance AI use cases previously only achievable with specialized vector databases like Pinecone.
When building an AI application, many developers ask themselves, “Do I need a standalone vector database, or can I just use a general-purpose database I already have and know?”
And while general-purpose databases like PostgreSQL have gained popularity for vector storage and search thanks to their familiarity and extensions like pgvector, the one argument for opting to use a dedicated vector database, like Pinecone, has been the promise of greater performance. The reasoning goes like this: dedicated vector databases have purpose-built data structures and algorithms for storing and searching large volumes of vector data, thus offering better performance and scalability than general-purpose databases with added vector support.
We built pgvectorscale to make PostgreSQL a better database for AI and to challenge the notion that PostgreSQL and pgvector are not performant for vector workloads. Pgvectorscale brings such specialized data structures and algorithms for large-scale vector search and storage to PostgreSQL as an extension, helping deliver comparable and often superior performance than specialized vector databases like Pinecone.
Pgvectorscale is an open-source PostgreSQL extension that builds on pgvector, enabling greater performance and scalability (keep reading for the actual numbers). By using pgvector and pgvectorscale, developers can build more scalable AI applications, benefiting from higher-performance embedding search and cost-efficient storage.
Licensed under the open-source PostgreSQL License, pgvectorscale complements pgvector by leveraging the pgvector data type and distance functions, further enriching the PostgreSQL ecosystem for building AI applications. While pgvector is written in C, the pgvectorscale extension is written in Rust, giving the community a new avenue to contribute to vector support in PostgreSQL.
Pgvectorscale builds on pgvector with two key innovations:
We gave a sneak peek of pgvectorscale to a select group of developers who are building AI applications with PostgreSQL. Here’s what John McBride, Head of Infrastructure at OpenSauced, a company using PostgreSQL to build an AI-enabled analytics platform for open-source projects, had to say:
“Pgvectorscale is a great addition to the PostgreSQL AI ecosystem. The introduction of Statistical Binary Quantization promises lightning performance for vector search and will be valuable as we scale our vector workload.”
Keep reading for an overview of StreamingDiskANN and Statistical Binary Quantization. For an in-depth tour, see our “how we built it” companion post for a technical deep dive into the pgvectorscale innovations.
Before “delving” into pgvectorscale’s StreamingDiskANN index for pgvector and its novel approach to binary quantization, let’s briefly unpack our claim that pgvectorscale helps PostgreSQL get comparable and often superior performance than specialized vector databases like Pinecone.
To test the performance impact of pgvectorscale, we compared the performance of PostgreSQL with pgvector and pgvectorscale against Pinecone, widely regarded as the market leader for specialized vector databases, on a benchmark using a dataset of 50 million Cohere embeddings (of 768 dimensions each). (We go into detail about the benchmarking methodology and results in this pgvector vs. Pinecone comparison blog post.)
PostgreSQL with pgvector and pgvectorscale outperformed Pinecone’s storage-optimized index (s1) with 28x lower p95 latency and 16x higher query throughput for approximate nearest neighbor queries at 99 % recall.
Furthermore, PostgreSQL with pgvectorscale achieves 1.4x lower p95 latency and 1.5x higher query throughput than Pinecone’s performance-optimized index (p2) at 90 % recall on the same dataset. The p2 pod index is what Pinecone recommends if you want the best possible performance, and to our surprise pgvectorscale still helped PostgreSQL outperform it!
Aside: For readers wondering, “What about using the p2 index at 99 % recall?” We thought the same thing, but unfortunately, Pinecone doesn't support the ability to tune your index to control the accuracy performance trade-off, unlike a flexible, more transparent engine like PostgreSQL, which exposes parameters for users to tune while also setting reasonable defaults. We detail this in our companion benchmark methodology post.
This impressive performance, combined with the trusted reliability and continuous evolution of PostgreSQL, makes it clear: building on PostgreSQL with pgvector and pgvectorscale is the smart choice for developers aiming to create high-performing, scalable AI applications.
The cost benefits are equally compelling. Self-hosting PostgreSQL with pgvector and pgvectorscale is 75-79 % cheaper than using Pinecone. Self-hosting PostgreSQL costs approximately $835 per month on AWS EC2, compared to Pinecone’s $3,241 per month for the storage-optimized index (s1) and $3,889 per month for the performance-optimized index (p2).
This result puts to bed the claims that PostgreSQL and pgvector are easy to start with but not scalable or performant for AI applications. With pgvectorscale, developers building GenAI applications can enjoy purpose-built performance for vector search without giving up the benefits of a fully featured PostgreSQL database and ecosystem.
And those benefits are numerous. Choosing a standalone vector database would mean you lose out on the full spectrum of data types, transactional semantics, and operational features that exist in a general-purpose database and are often necessary for deploying production apps.
Here’s an overview of how PostgreSQL provides a superior developer experience to standalone vector databases like Pinecone:
pg_stat_statements
for query statistics, EXPLAIN
plans for debugging slow queries, and the numerous connectors, libraries, and drivers for every other technology in your AI data stack.Thanks to PostgreSQL with pgvector and pgvectorscale, we can all be the dev on the right:
For more details on pgvectorscale performance and our PostgreSQL vs. Pinecone benchmark, see our companion post on in-depth benchmark methodology and results.
(This is not our last release today, so keep your eyes peeled. 👀)
Inspired by Microsoft’s DiskANN (also referred to as Vamana), pgvectorscale adds a third approximate nearest neighbor (ANN) search algorithm, StreamingDiskANN, to pgvector. This comes in addition to pgvector's existing IVFFlat (inverted file flat) and HNSW (hierarchical navigable small world) vector search indexes.
StreamingDiskANN is optimized for storing the index on disk as opposed to in-memory indexes like HNSW, making it more cost-efficient to run and scale as vector workloads grow. This vastly decreases the cost of storing and searching over large amounts of vectors since SSDs are much cheaper than RAM.
The StreamingDiskANN index uses the pgvector vector
data type, so developers already using pgvector don’t need to migrate data or use a different type. All that’s needed is creating a new index on the embeddings column. See a code snippet below:
CREATE EXTENSION vectorscale;
CREATE INDEX document_embedding_idx ON document_embedding
USING diskann(embedding);
The Pinecone Graph Algorithm is also based on DiskANN, meaning developers can now use a similar search algorithm as Pinecone in PostgreSQL without using a standalone vector database.
After getting good initial feedback from Timescale customers on an early version of this index released in October 2023, we’ve further refined the StreamingDiskANN index based on continued feedback and decided to make it open source and free so that all developers using PostgreSQL for AI can benefit from it.
We call pgvectorscale’s index StreamingDiskANN as it supports streaming filtering, which allows for accurate retrieval even when secondary filters are applied during similarity search. This scenario is common in production RAG (retrieval-augmented generation) applications, where, for example, documents are often associated with a set of tags, and you may want to constrain your similarity search by requiring a match of the tags as well as high vector similarity.
One pitfall of the HNSW index in pgvector is that it retrieves a pre-set number of records (set by the hnsw.ef_search
parameter) before applying secondary filters. Therefore, if the filters exclude all the vectors fetched from the index, the HNSW index would fail to retrieve data for the search with high accuracy. This scenario is common when searching through large datasets of vectors.
Pgvectorscale’s StreamingDiskANN index has no “ef_search” type cutoff. Instead, it uses a streaming model that allows the index to continuously retrieve the “next closest” item for a given query, potentially even traversing the entire graph. The Postgres execution system will continuously ask for the “next closet” item until it has matched the LIMIT N
items that satisfy the additional filters. This form of post-filtering suffers absolutely no accuracy degradation.
In their blog post comparing Pinecone to pgvector, the Pinecone team emphasized the “ef_search” type limitation in pgvector’s HNSW index to highlight a key weakness of pgvector. However, with the introduction of pgvectorscale and the StreamingDiskANN index for pgvector, this weakness no longer exists. This illustrates the power of open-source projects to build upon prior work and move quickly to solve problems for the benefit of the community.
Pgvectorscale’s StreamingDiskANN index includes support for Statistical Binary Quantization (SBQ), a novel binary quantization method developed by researchers at Timescale.
Many vector indexes use compression to reduce the space needed for vector storage and make index traversal faster at the cost of some loss in accuracy. The common algorithms are product quantization (PQ) and binary quantization (BQ). In fact, pgvector’s HNSW index just added BQ in the latest 0.7.0 release (yay!).
Statistical Binary Quantization improves accuracy over traditional methods of quantization, providing a better accuracy vs. performance trade-off. The result is increased search performance at higher accuracy from indexes that take up less space on disk and in memory.
We took a look at the BQ algorithm and were unhappy with the amount of accuracy loss it produced. We also immediately saw some low-hanging fruit for improving it. By analyzing the structure of vector data and experimenting with the quantization algorithm, we came up with the new SBQ compression algorithm. When evaluating SBQ on real-world data, we found it significantly improves accuracy in our search.
If you’re interested in the low-level details, we “delve” into details about Statistical Binary Quantization and StreamingDiskANN in this technical blog post.
Pgvectorscale is open source under the PostgreSQL License and is available for you to use in your AI projects today. You can find installation instructions on the pgvectorscale GitHub repository.
You can also access pgvectorscale on any database service on Timescale’s cloud PostgreSQL platform. For production vector workloads, we’re offering private beta access to vector-optimized databases with pgvector and pgvectorscale on Timescale. Sign up here for priority access.
Pgvectorscale is an effort to enrich the PostgreSQL ecosystem for AI. If you’d like to help, here’s how you can get involved:
Let's make Postgres a better database for AI, together!