AI

Feb 19, 2025

One Line of SQL, All the LiteLLM Embeddings

LiteLLM is now supported by pgai Vectorizer

Posted by

Jacky Liang

We're excited to announce that pgai Vectorizer now supports LiteLLM, enabling you to use embedding models from even more providers like Cohere, Mistral, Azure OpenAI, AWS Bedrock, Hugging Face, and more—all through the same familiar interface. 

image

The AI team at Timescale has been busy creating tools to make AI more accessible. As part of that mission, I’m super excited that pgai Vectorizer now supports LiteLLM, enabling you to use embedding models from even more providers like Cohere, Mistral, Azure OpenAI, AWS Bedrock, Hugging Face, and more—all through the same familiar interface. 

Pgai Vectorizer’s new LiteLLM integration lets you swap between the latest embedding models without writing new code—saving you time, cost, and development headaches.

Let’s show how to leverage this LiteLLM integration to run benchmarks on the three most popular closed-source embedding models (OpenAI's text-embedding-3-small, Mistral Embed, and Cohere English v3.0) and how pgai Vectorizer makes it super easy to run your own evaluations.

What's New? LiteLLM Embeddings in pgai Vectorizer 

Pgai Vectorizer has always made it easy to create and manage embeddings in PostgreSQL. Now with LiteLLM integration, you get access to:

  • Leading embedding models from Cohere, Mistral, and all models hosted through AWS Bedrock, Azure, and Hugging Face, in addition to OpenAI, Voyage, and Ollama
  • The simplicity of just one line of SQL to access all models—using the same simple interface
  • Simple configuration through environment variables
  • Peace of mind: You can focus on building your app while we handle all API calls and rate limits for you

For example, here's how you can use Cohere's latest embedding model:

SELECT ai.create_vectorizer(
    'my_table'::regclass,
    destination => 'cohere_embeddings',
    embedding => ai.embedding_litellm(
        'cohere/embed-english-v3.0',
        1024,
        api_key_name => 'COHERE_API_KEY'
    ),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);

The same pattern works for other providers. Here's Mistral's embedding model:

SELECT ai.create_vectorizer(
    'my_table'::regclass,
    destination => 'mistral_embeddings',
    embedding => ai.embedding_litellm(
        'mistral/mistral-embed',
        1024,
        api_key_name => 'MISTRAL_API_KEY'
    ),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);

For complete instructions on how to use each provider, please reference the pgai Vectorizer docs for LiteLLM.

The integration of LiteLLM embeddings in pgai Vectorizer aims to bring streamlined workflows to an embedding model landscape evolving at an extremely rapid pace. New, better models come out on almost a weekly basis (I mean, check Hugging Face, can you keep up?). When new models are released so frequently, each will come with different trade-offs in accuracy, cost, and speed. 

With LiteLLM support in pgai Vectorizer, you can simplify multiple tasks: 

  • Keep your embeddings in PostgreSQL, where your business data already lives 
  • Experiment with different models without changing your application code
  • Switch providers easily with one line of SQL if your needs change
  • Take advantage of new models as they become available

Finding the Best Model for Your Use Case

One of the biggest advantages of pgai Vectorizer's LiteLLM integration is how easy it makes testing different embedding models. Let's walk through an example of how we compared three popular models—Cohere, Mistral, and OpenAI, using the same dataset and evaluation process—all through a simple interface of one line of SQL. 

Using Paul Graham's essays as our test dataset, we evaluated three widely-used models:

  1. OpenAI's text-embedding-3-small
  2. Mistral Embed
  3. Cohere English v3.0 

Our evaluation tested each model's ability to handle different types of questions:

  • Long, detailed questions (e.g., "What are the specific challenges entrepreneurs face when starting a company according to this passage?")
  • Direct questions (e.g., "What year did Paul Graham start Y Combinator?")
  • Context-based questions (e.g., "How does the author's experience influence his view of startups?")
  • Short questions (e.g., "What is a startup?")
  • Unclear questions (e.g., "Why did they do that?")

Here's what we found in our testing:

  1. All models excelled at handling detailed questions (83-90 % accuracy)
  2. They showed strong capabilities with direct questions (75-82 % accuracy)
  3. Context-based questions yielded good results (73-83 % accuracy)
  4. Performance varied more with short questions (58-80 % accuracy)
  5. Unclear questions were challenging across the board (56-65 % accuracy)

The key takeaway? Each model has its own strengths, and the "best" choice depends entirely on your specific use case. With pgai Vectorizer, you can easily run similar comparisons on your own data to find the perfect fit. 

Here are the raw results: 

  1. OpenAI's text-embedding-3-small: 79.5 % overall accuracy
  2. Mistral’s Embed: 77.5 % overall accuracy
  3. Cohere’s Embed v3.0: 69.5 % overall accuracy

Breaking this down by question types:

image

Note: Since we use GPT-4o-mini to generate questions and OpenAI's embedding model in our evaluation, there may be some inherent bias favoring the OpenAI embedding model. This is because the question generation model and embedding model may share a similar understanding of language patterns.  

Testing Embedding Models With Pgai Vectorizer’s LiteLLM Integration

Getting Started

1. First, set up your API keys in your Docker composer.yaml environment:

You can get your API key for Cohere, Mistral, and OpenAI here. 

name: pgai 
services:
  db:
    environment:
      COHERE_API_KEY: "your-key-here"
      MISTRAL_API_KEY: "your-key-here"
      OPENAI_API_KEY: "your-key-here"

2. Create vectorizers for the models you want to test, for example:

-- Cohere
SELECT ai.create_vectorizer(
    'source_table'::regclass,
    destination => 'cohere_embeddings',
    embedding => ai.embedding_litellm(
        'cohere/embed-english-v3.0',
        1024,
        api_key_name => 'COHERE_API_KEY'
    )
    ...
);

-- Mistral
SELECT ai.create_vectorizer(
    'source_table'::regclass,
    destination => 'mistral_embeddings',
    embedding => ai.embedding_litellm(
        'mistral/mistral-embed',
        1024,
        api_key_name => 'MISTRAL_API_KEY'
    )
    ...
);

-- OpenAI
SELECT ai.create_vectorizer(
    'paul_graham_essays'::regclass,
    destination => 'essays_openai_small_embeddings',
    embedding => ai.embedding_openai(
      'text-embedding-3-small', 
      1024, 
      api_key_name => 'OPENAI_API_KEY'
    )
    ...
);

3. Monitor progress:

SELECT * FROM ai.vectorizer_status;
  1. Check any errors that may occur:
SELECT * from ai.vectorizer_errors

You can also check Docker’s logs for the Vectorizer for any potential errors if they happen.

docker compose logs -f vectorizer-worker

Running Your Own Evaluation

Want to reproduce our evaluation or test different models? We have open-sourced the evaluation code—here's the complete setup process: 

Prerequisites

  1. Docker and Docker Compose installed: download link for all platforms
  2. API keys for the models you want to test (we used Cohere, Mistral, and OpenAI)

Step 1: Set up the environment

1. Create a directory and add this compose.yaml:

name: pgai 
services:
  db:
    image: timescale/timescaledb-ha:pg17
    environment:
      POSTGRES_PASSWORD: postgres
      OPENAI_API_KEY: "your-key"
      COHERE_API_KEY: "your-key"
      MISTRAL_API_KEY: "your-key"
    ports:
      - "5432:5432"
    volumes:
      - ./data:/var/lib/postgresql/data
  vectorizer-worker:
    image: timescale/pgai-vectorizer-worker:latest
    environment:
      PGAI_VECTORIZER_WORKER_DB_URL: postgres://postgres:postgres@db:5432/postgres
      OPENAI_API_KEY: "your-key"
      COHERE_API_KEY: "your-key"
      MISTRAL_API_KEY: "your-key"
    command: [ "--poll-interval", "5s" ]

2. Start the services:

docker compose up -d

3. Connect to your database:

docker compose exec db psql -U postgres

Step 2: Set up the database

1. Enable pgai extension:

CREATE EXTENSION IF NOT EXISTS ai CASCADE;

2. Optional: verify your pgai version, it should say version 0.8.0 or above:

postgres=# \dx ai;
                List of installed extensions
 Name | Version | Schema |            Description            
------+---------+--------+-----------------------------------
 ai   | 0.8.0   | ai     | helper functions for ai workflows


3. Load Paul Graham's essays dataset:

Using the load_dataset() function, pgai lets you load any Hugging Face dataset with one line of SQL. How convenient!

SELECT ai.load_dataset('sgoel9/paul_graham_essays');

4. Add a primary key (required for Vectorizer):

ALTER TABLE paul_graham_essays ADD PRIMARY KEY (id);

Note: The dataset we're using (paul_graham_essays) from Hugging Face contains columns id, title, text, and date.

Step 3: Create vectorizers

Set up vectorizers for the three models we are testing:

-- Cohere embed-english-v3.0
SELECT ai.create_vectorizer(
    'paul_graham_essays'::regclass,
    destination => 'essays_cohere_embeddings',
    embedding => ai.embedding_litellm(
        'cohere/embed-english-v3.0',
        1024,
        api_key_name => 'COHERE_API_KEY'
    ),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);

-- Mistral Embed
SELECT ai.create_vectorizer(
    'paul_graham_essays'::regclass,
    destination => 'essays_mistral_embeddings',
    embedding => ai.embedding_litellm(
        'mistral/mistral-embed',
        1024,
        api_key_name => 'MISTRAL_API_KEY'
    ),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);

-- OpenAI text-embedding-3-small
SELECT ai.create_vectorizer(
    'paul_graham_essays'::regclass,
    destination => 'essays_openai_small_embeddings',
    embedding => ai.embedding_openai(
        'text-embedding-3-small', 
        1024,
        api_key_name => 'OPENAI_API_KEY'
    ),
    chunking => ai.chunking_recursive_character_text_splitter('text', 512, 50)
);

Step 4: Monitor progress

Once you create the Vectorizer, the vectorizer worker will check for jobs every five seconds by default. To check the status of the job, run the following: 

postgres=# SELECT * FROM ai.vectorizer_status;
 id |       source_table        |                target_table                 |                 view                  | pending_items | disabled 
----+---------------------------+---------------------------------------------+---------------------------------------+---------------+----------
  1 | public.paul_graham_essays | public.essays_cohere_embeddings_store       | public.essays_cohere_embeddings       |             0 | f
  2 | public.paul_graham_essays | public.essays_mistral_embeddings_store      | public.essays_mistral_embeddings      |             0 | f
  3 | public.paul_graham_essays | public.essays_openai_small_embeddings_store | public.essays_openai_small_embeddings |             0 | f
(3 rows)

When pending_items hits zero, then the vectorizer has completed its embedding job. 

Optional: Check for any errors:

SELECT * FROM ai.vectorizer_errors;

Step 5: Run the evaluation

Run the evaluation script:

python3 eval.py

The evaluation process happens in three main steps. Because some steps like generating the embeddings can take a while (and hit rate limits), we allow you to enable or disable certain steps that you’ve already completed in the main() function of eval.py:

def main():
    Config.validate()
    evaluator = StepByStepEvaluator()
    
    # Step 1: Get chunks from vectorized tables
    chunks = evaluator.step1_get_chunks()
    pd.DataFrame(chunks).to_csv('chunks.csv')

    # Step 2: Generate questions
    questions = evaluator.step2_generate_questions()
    pd.DataFrame(questions).to_csv('questions.csv')

    # Step 3: Evaluate models
    results = evaluator.step3_evaluate_models()
    pd.DataFrame(results).to_csv('results.csv')

To do this, simply comment out the steps you've already completed. For example, if you already have your questions generated and saved in questions.csv, you can comment out steps one and two to only run the model evaluation. 

How We Evaluate Embedding Models

The evaluation follows a systematic approach to test how well each embedding model understands and retrieves relevant content: 

  1. Select the data
    1. Randomly selects 20 text chunks from Paul Graham's essays
    2. Each chunk is a coherent piece of text around 512 tokens long
    3. These chunks become our "ground truth" for testing
  2. Generate the questions
    1. For each chunk, generate 20 different questions (four of each type):
      1. Short questions: under 10 words, testing basic comprehension
      2. Long questions: detailed queries requiring specific information
      3. Direct questions: asking about explicit information in the text
      4. Implied questions: requiring an understanding of the context
      5. Unclear questions: testing handling of ambiguous queries
    2. Uses GPT-4o-mini to generate diverse, realistic questions
    3. Questions are saved in questions.csv for reproducibility
  3. Evaluate the model
    1. For each model (Cohere, Mistral, OpenAI), the evaluation covers these steps:
      1. Takes each question
      2. Performs a vector similarity search to find the top 10 most relevant chunks
      3. Checks if the original chunk (ground truth) appears in these results
      4. Scores one (1) for success, zero (0) for failure
    2. Calculates accuracy metrics: 
      1. Overall accuracy across all question types
      2. Accuracy breakdown by question type
    3. Results are saved in results.csv

This methodology tests both precision (can the model find the exact chunk from which a question was generated?) and comprehension (can it handle different ways of asking for the same information?). 

It’s a simple but effective evaluation that works across all embedding models.  

Key settings

You can adjust key parameters in the Config class if you want to expand the surface area of the evaluation: 

NUM_CHUNKS = 20               # Number of text chunks to evaluate
NUM_QUESTIONS_PER_CHUNK = 20  # Total questions per chunk
TOP_K = 10                    # Number of closest chunks to retrieve
QUESTION_DISTRIBUTION = {     # Questions per type
    'short': 4,    
    'long': 4,     
    'direct': 4,   
    'implied': 4,  
    'unclear': 4   
}

Next Steps

And that’s it, you’re now equipped to evaluate different embedding models in your application.

Check out the pgai Vectorizer LiteLLM documentation for detailed setup instructions and provider-specific configurations. Since embedding models are very cost-effective now, it's worth experimenting to find the best fit for your use case.

The best model for your application depends on your specific needs, please consider the following factors when choosing one:

  • Query patterns in your application
  • Performance requirements
  • Cost constraints
  • Data characteristics

With pgai Vectorizer's LiteLLM support, you can now easily evaluate these trade-offs yourself. 

Further reading

Here are more blog posts about RAG with PostgreSQL and different tools:

  1. Pgai Vectorizer LiteLLM documentation
  2. Simple Embedding Model Evaluation GitHub Code
  3. Simple Embedding Model Evaluation Video
  4. Pgai Vectorizer Quick Start for Ollama on Self-Hosted PostgreSQL
  5. Introduction to Pgai Vectorizer

Originally posted

Feb 19, 2025

Share

pgai

4.2k

pgvectorscale

1.7k

Subscribe to the Timescale Newsletter

By submitting you acknowledge Timescale's Privacy Policy.