Making PostgreSQL a Better AI Database
Introducing pgai and pgvectorscale, two new open-source extensions that make PostgreSQL an even better database for AI applications. Bringing greater ease of use and unlocking large-scale, high-performance AI use cases previously achievable only with specialized vector databases like Pinecone.
Vector databases are critical to building AI applications. And in the future, all applications will be AI applications. At Timescale, we believe PostgreSQL is the bedrock of the future of data. And that the strength of the PostgreSQL ecosystem is what makes it the most loved database for professional developers.
However, the rise of AI applications that leverage the capabilities unlocked by large language models (LLMs) means that developers now demand more from their databases. In order to become the preferred AI database, PostgreSQL, as we know it, will have to adapt to these new developer needs.
AI databases must store, manage, and process data for AI and machine learning applications. They need to be able to store and query large volumes of vectors efficiently and quickly and scale to meet production demands. An AI database should also make developers’ lives easier when building AI applications and performing AI tasks, helping with embedding creation and updating, and bringing models closer to the database for easy access.
The good news is that evolving to meet changing developer needs is precisely what PostgreSQL, the workhorse birdhorse of databases, has been doing since its inception, thanks to its rich extension ecosystem and community.
Two new open-source extensions to make PostgreSQL a better AI database—pgai and pgvectorscale
Today, we’re proud to add two new open-source extensions, both licensed under the Open Source PostgreSQL License, to further enrich the Postgres ecosystem and make Postgres the de facto database for building AI applications, removing the need for developers to use a standalone vector database in their AI data stack.
These extensions complement pgvector, the popular open-source extension for vector data in PostgreSQL, adding unique capabilities that help developers use PostgreSQL to build AI applications. They improve ease of use and address performance and scalability issues levied against pgvector in its current form.
Pgvectorscale: Cost-Efficient Scaling for High Volume Vector Workloads
Pgvectorscale is a PostgreSQL extension that builds on pgvector for more performance and scale. pgvectorscale enables developers to build more scalable AI applications with higher-performance embedding search and cost-efficient storage.
The initial release includes two key innovations: (1) StreamingDiskANN, a high-performance, cost-efficient vector search index for pgvector data inspired by research at Microsoft, and (2) Statistical Binary Quantization (SBQ), developed by Timescale’s own researchers to improve upon standard binary quantization techniques.
To test the performance impact of pgvectorscale, we compare the performance of PostgreSQL and Pinecone on a benchmark using a dataset of 50 million Cohere embeddings (768 dimensions).
On our benchmark of 50 million Cohere embeddings (768 dimensions each), PostgreSQL with pgvector and pgvectorscale achieves 28x lower p95 latency and 16x higher query throughput compared to Pinecone for approximate nearest neighbor queries at 99 % recall, all at 75 % less cost when self-hosted on AWS EC2.
For more details on pgvectorscale performance and our PostgreSQL vs. Pinecone benchmark, see our companion post on in-depth benchmark methodology and results.
While pgvector is written in C, pgvectorscale is written in Rust, offering the PostgreSQL community a new avenue to contribute to vector capabilities.
To learn more about StreamingDiskANN and Statistical Binary Quantization, see our “how we built it” post for a technical deep dive into the pgvectorscale’s innovations that make PostgreSQL as fast as Pinecone.
Pgai: Giving PostgreSQL Developers AI Engineering Superpowers
Pgai is a PostgreSQL extension that brings more AI workflows to PostgreSQL, like embedding creation and model completion. Pgai helps more PostgreSQL developers gain the skills of AI engineers, making it easier for them to build search and retrieval-augmented generation (RAG) applications.
The initial release supports creating OpenAI embeddings and getting OpenAI chat completions from models like GPT4o directly from your PostgreSQL database. This integration allows for classification, summarization, and data enrichment tasks on existing relational data, streamlining the development process from proof of concept to production.
We are working on supporting more models in pgai and welcome community contributions for models and functionality you want to see—simply file an issue in the pgai GitHub repository or contribute a feature today.
Early Developer Reviews of Pgai and Pgvectorscale
We gave a sneak peek of pgvectorscale and pgai to a select group of developers who are building AI applications with PostgreSQL. Here’s what they had to say:
“Pgvectorscale and pgai are incredibly exciting for building AI applications with PostgreSQL. Having embedding functions directly within the database is a huge bonus,” said Web Begole, CTO of Market Reader, a company using PostgreSQL to build an AI-enabled financial information platform. Begole elaborated, “Previously, updating our saved embeddings was a tedious task, but now, with everything integrated, it promises to be much simpler and more efficient. This will save us a significant amount of time and effort.”
John McBride, Head of Infrastructure at OpenSauced, a company building an AI-enabled analytics platform for open-source projects using PostgreSQL, said, “Pgvectorscale and pgai are great additions to the PostgreSQL AI ecosystem. The introduction of Statistical Binary Quantization promises lightning performance for vector search and will be valuable as we scale our vector workload.” McBride concluded, “Pgai removes the need for developers to re-implement common functionality themselves, and I’m excited for the use cases it enables.”
Why Are We Doing This? To Help PostgreSQL Win!
We believe that PostgreSQL—with its rich ecosystem, multiple data type support, and battle-tested reliability—is the bedrock for the future of data. In short: PostgreSQL for Everything. And in the future, everything will be infused with AI.
We want to lower the barriers for developers adopting and scaling PostgreSQL for their AI applications—whether they be search, RAG, or Agents—by removing the need to adopt a separate vector database and simplifying their data architectures as they scale.
By open-sourcing pgai and pgvectorscale, we hope to accelerate PostgreSQL’s rise as the de facto database for building AI applications. We hope more developers can trade complex, brittle data architectures for the rock-solid foundation, versatile extensions, and straightforward simplicity of PostgreSQL in their AI data stack.
How Can You Get Involved?
If you’d like to join us in making PostgreSQL better for AI, here’s how you can get involved:
- Share the news with your friends and colleagues: Share our posts announcing pgai and pgvectorscale on X/Twitter, LinkedIn, and Threads. We promise to RT back.
- Join the Postgres for AI Discord: Join a community of developers building AI applications with PostgreSQL in the pgai Discord server. Share what you’re working on, help and get helped by a community of peers.
- Submit issues and feature requests: We encourage you to submit issues and feature requests for functionality you’d like to see, bugs you find, and suggestions you think would improve both projects.
- Make a contribution: We welcome community contributions for both pgvectorscale and pgai. Pgvectorscale is written in Rust, while pgai uses Python and PL/Python. For pgai specifically, let us know which models you want to see supported, particularly for open-source embedding and generation models. See the pgai GitHub issues for more.
- Offer pgvectorscale and pgai extensions on your PostgreSQL cloud: Pgvectorscale and pgai are open-source projects under the PostgreSQL license. We encourage you to offer pgvectorscale and pgai on your managed PostgreSQL database-as-a-service platform, and we can even help you spread the word. Get in touch via our Contact Us form and mention pgai and pgvectorscale to discuss further.
- Use pgai and pgvectorscale today: Pgvectorscale and pgai are open source under the PostgreSQL license and are available for you to use in your AI projects today. You can find installation instructions on the pgai GitHub and pgvectorscale GitHub repositories, respectively. You can also access both pgai and pgvectorscale on any database service on Timescale’s cloud PostgreSQL platform. For production vector workloads, we’re offering private beta access to vector-optimized databases with pgvector and pgvectorscale on Timescale. Sign up here for priority access.
Let's make Postgres a better database for AI, together!