Category: All posts
Feb 19, 2025
Posted by
Jacky Liang
We're excited to announce that pgai Vectorizer now supports LiteLLM, enabling you to use embedding models from even more providers like Cohere, Mistral, Azure OpenAI, AWS Bedrock, Hugging Face, and more—all through the same familiar interface.
The AI team at Timescale has been busy creating tools to make AI more accessible. As part of that mission, I’m super excited that pgai Vectorizer now supports LiteLLM, enabling you to use embedding models from even more providers like Cohere, Mistral, Azure OpenAI, AWS Bedrock, Hugging Face, and more—all through the same familiar interface.
Pgai Vectorizer’s new LiteLLM integration lets you swap between the latest embedding models without writing new code—saving you time, cost, and development headaches.
Let’s show how to leverage this LiteLLM integration to run benchmarks on the three most popular closed-source embedding models (OpenAI's text-embedding-3-small, Mistral Embed, and Cohere English v3.0) and how pgai Vectorizer makes it super easy to run your own evaluations.
Pgai Vectorizer has always made it easy to create and manage embeddings in PostgreSQL. Now with LiteLLM integration, you get access to:
For example, here's how you can use Cohere's latest embedding model:
SELECT ai.create_vectorizer(
'my_table'::regclass,
destination => 'cohere_embeddings',
embedding => ai.embedding_litellm(
'cohere/embed-english-v3.0',
1024,
api_key_name => 'COHERE_API_KEY'
),
chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);
The same pattern works for other providers. Here's Mistral's embedding model:
SELECT ai.create_vectorizer(
'my_table'::regclass,
destination => 'mistral_embeddings',
embedding => ai.embedding_litellm(
'mistral/mistral-embed',
1024,
api_key_name => 'MISTRAL_API_KEY'
),
chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);
For complete instructions on how to use each provider, please reference the pgai Vectorizer docs for LiteLLM.
The integration of LiteLLM embeddings in pgai Vectorizer aims to bring streamlined workflows to an embedding model landscape evolving at an extremely rapid pace. New, better models come out on almost a weekly basis (I mean, check Hugging Face, can you keep up?). When new models are released so frequently, each will come with different trade-offs in accuracy, cost, and speed.
With LiteLLM support in pgai Vectorizer, you can simplify multiple tasks:
One of the biggest advantages of pgai Vectorizer's LiteLLM integration is how easy it makes testing different embedding models. Let's walk through an example of how we compared three popular models—Cohere, Mistral, and OpenAI, using the same dataset and evaluation process—all through a simple interface of one line of SQL.
Using Paul Graham's essays as our test dataset, we evaluated three widely-used models:
Our evaluation tested each model's ability to handle different types of questions:
Here's what we found in our testing:
The key takeaway? Each model has its own strengths, and the "best" choice depends entirely on your specific use case. With pgai Vectorizer, you can easily run similar comparisons on your own data to find the perfect fit.
Here are the raw results:
Breaking this down by question types:
Note: Since we use GPT-4o-mini to generate questions and OpenAI's embedding model in our evaluation, there may be some inherent bias favoring the OpenAI embedding model. This is because the question generation model and embedding model may share a similar understanding of language patterns.
1. First, set up your API keys in your Docker composer.yaml
environment:
You can get your API key for Cohere, Mistral, and OpenAI here.
name: pgai
services:
db:
environment:
COHERE_API_KEY: "your-key-here"
MISTRAL_API_KEY: "your-key-here"
OPENAI_API_KEY: "your-key-here"
2. Create vectorizers for the models you want to test, for example:
-- Cohere
SELECT ai.create_vectorizer(
'source_table'::regclass,
destination => 'cohere_embeddings',
embedding => ai.embedding_litellm(
'cohere/embed-english-v3.0',
1024,
api_key_name => 'COHERE_API_KEY'
)
...
);
-- Mistral
SELECT ai.create_vectorizer(
'source_table'::regclass,
destination => 'mistral_embeddings',
embedding => ai.embedding_litellm(
'mistral/mistral-embed',
1024,
api_key_name => 'MISTRAL_API_KEY'
)
...
);
-- OpenAI
SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_openai_small_embeddings',
embedding => ai.embedding_openai(
'text-embedding-3-small',
1024,
api_key_name => 'OPENAI_API_KEY'
)
...
);
3. Monitor progress:
SELECT * FROM ai.vectorizer_status;
SELECT * from ai.vectorizer_errors
You can also check Docker’s logs for the Vectorizer for any potential errors if they happen.
docker compose logs -f vectorizer-worker
Want to reproduce our evaluation or test different models? We have open-sourced the evaluation code—here's the complete setup process:
1. Create a directory and add this compose.yaml
:
name: pgai
services:
db:
image: timescale/timescaledb-ha:pg17
environment:
POSTGRES_PASSWORD: postgres
OPENAI_API_KEY: "your-key"
COHERE_API_KEY: "your-key"
MISTRAL_API_KEY: "your-key"
ports:
- "5432:5432"
volumes:
- ./data:/var/lib/postgresql/data
vectorizer-worker:
image: timescale/pgai-vectorizer-worker:latest
environment:
PGAI_VECTORIZER_WORKER_DB_URL: postgres://postgres:postgres@db:5432/postgres
OPENAI_API_KEY: "your-key"
COHERE_API_KEY: "your-key"
MISTRAL_API_KEY: "your-key"
command: [ "--poll-interval", "5s" ]
2. Start the services:
docker compose up -d
3. Connect to your database:
docker compose exec db psql -U postgres
1. Enable pgai extension:
CREATE EXTENSION IF NOT EXISTS ai CASCADE;
2. Optional: verify your pgai version, it should say version 0.8.0 or above:
postgres=# \dx ai;
List of installed extensions
Name | Version | Schema | Description
------+---------+--------+-----------------------------------
ai | 0.8.0 | ai | helper functions for ai workflows
3. Load Paul Graham's essays dataset:
Using the load_dataset()
function, pgai lets you load any Hugging Face dataset with one line of SQL. How convenient!
SELECT ai.load_dataset('sgoel9/paul_graham_essays');
4. Add a primary key (required for Vectorizer):
ALTER TABLE paul_graham_essays ADD PRIMARY KEY (id);
Note: The dataset we're using (paul_graham_essays) from Hugging Face contains columns id
, title
, text
, and date
.
Set up vectorizers for the three models we are testing:
-- Cohere embed-english-v3.0
SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_cohere_embeddings',
embedding => ai.embedding_litellm(
'cohere/embed-english-v3.0',
1024,
api_key_name => 'COHERE_API_KEY'
),
chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);
-- Mistral Embed
SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_mistral_embeddings',
embedding => ai.embedding_litellm(
'mistral/mistral-embed',
1024,
api_key_name => 'MISTRAL_API_KEY'
),
chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);
-- OpenAI text-embedding-3-small
SELECT ai.create_vectorizer(
'paul_graham_essays'::regclass,
destination => 'essays_openai_small_embeddings',
embedding => ai.embedding_openai(
'text-embedding-3-small',
1024,
api_key_name => 'OPENAI_API_KEY'
),
chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);
Once you create the Vectorizer, the vectorizer worker will check for jobs every five seconds by default. To check the status of the job, run the following:
postgres=# SELECT * FROM ai.vectorizer_status;
id | source_table | target_table | view | pending_items | disabled
----+---------------------------+---------------------------------------------+---------------------------------------+---------------+----------
1 | public.paul_graham_essays | public.essays_cohere_embeddings_store | public.essays_cohere_embeddings | 0 | f
2 | public.paul_graham_essays | public.essays_mistral_embeddings_store | public.essays_mistral_embeddings | 0 | f
3 | public.paul_graham_essays | public.essays_openai_small_embeddings_store | public.essays_openai_small_embeddings | 0 | f
(3 rows)
When pending_items
hits zero, then the vectorizer has completed its embedding job.
Optional: Check for any errors:
SELECT * FROM ai.vectorizer_errors;
Run the evaluation script:
python3 eval.py
The evaluation process happens in three main steps. Because some steps like generating the embeddings can take a while (and hit rate limits), we allow you to enable or disable certain steps that you’ve already completed in the main()
function of eval.py
:
def main():
Config.validate()
evaluator = StepByStepEvaluator()
# Step 1: Get chunks from vectorized tables
chunks = evaluator.step1_get_chunks()
pd.DataFrame(chunks).to_csv('chunks.csv')
# Step 2: Generate questions
questions = evaluator.step2_generate_questions()
pd.DataFrame(questions).to_csv('questions.csv')
# Step 3: Evaluate models
results = evaluator.step3_evaluate_models()
pd.DataFrame(results).to_csv('results.csv')
To do this, simply comment out the steps you've already completed. For example, if you already have your questions generated and saved in questions.csv
, you can comment out steps one and two to only run the model evaluation.
The evaluation follows a systematic approach to test how well each embedding model understands and retrieves relevant content:
questions.csv
for reproducibilityresults.csv
This methodology tests both precision (can the model find the exact chunk from which a question was generated?) and comprehension (can it handle different ways of asking for the same information?).
It’s a simple but effective evaluation that works across all embedding models.
You can adjust key parameters in the Config
class if you want to expand the surface area of the evaluation:
NUM_CHUNKS = 20 # Number of text chunks to evaluate
NUM_QUESTIONS_PER_CHUNK = 20 # Total questions per chunk
TOP_K = 10 # Number of closest chunks to retrieve
QUESTION_DISTRIBUTION = { # Questions per type
'short': 4,
'long': 4,
'direct': 4,
'implied': 4,
'unclear': 4
}
And that’s it, you’re now equipped to evaluate different embedding models in your application.
Check out the pgai Vectorizer LiteLLM documentation for detailed setup instructions and provider-specific configurations. Since embedding models are very cost-effective now, it's worth experimenting to find the best fit for your use case.
The best model for your application depends on your specific needs, please consider the following factors when choosing one:
With pgai Vectorizer's LiteLLM support, you can now easily evaluate these trade-offs yourself.
Here are more blog posts about RAG with PostgreSQL and different tools: