Category: All posts
Oct 17, 2024
Posted by
Haziqa Sajid
With over 3.2 billion images shared online daily, the demand for efficient image search capabilities has never been higher. The opportunity to create a powerful image search engine spans many fields, from e-commerce to social media, driven by this high-velocity data.
Historically, image search solutions primarily relied on keyword-based methods, where images were matched based on captions or tags. However, these methods often fell short because computers couldn't interpret the content of images beyond their associated text.
Thanks to OpenAI's CLIP model, systems can now understand both visual and textual information. In this article, we will build an image search application using the OpenAI CLIP model and a managed PostgreSQL database with Timescale in JavaScript.
OpenAI CLIP (Contrastive Language–Image Pre-training) is a neural network that learns visual concepts from natural language supervision.
It can classify images by leveraging text descriptions and achieving zero-shot capabilities, meaning it can recognize visual categories without direct training on specific datasets.
1. Efficiency: CLIP is trained on diverse Internet-sourced text-image pairs, reducing the need for costly, labor-intensive labeled datasets.
2. Flexibility: the model can perform various visual classification tasks by simply providing textual descriptions of the categories.
3. Robust performance: it also excels in recognizing objects in varied settings and abstract depictions, outperforming traditional models on many benchmarks.
1. Image search: it enables more accurate and flexible image retrieval by understanding natural language queries.
2. Content moderation: OpenAI CLIP helps identify inappropriate content by recognizing complex visual patterns.
3. Automated tagging: CLIP enhances the tagging accuracy for extensive image collections, which is helpful in social media and digital asset management.
4. Object recognition: the OpenAI model can be applied in diverse fields, from medical imaging to autonomous driving, improving recognition tasks without extensive retraining.
PostgreSQL is an open-source relational database system known for its robust performance, extensibility, and support for advanced data types. PostgreSQL can handle high-dimensional vector data efficiently with extensions like pgvector, pgai, and pgvectorscale. You can install these open-source extensions on your machine or easily access them on any database service in the Timescale Cloud PostgreSQL platform.
Benefits:
Let’s start building our image search application using JavaScript. The client-side will be developed with React JS, while Node JS and Express will handle the server-side logic. But first, the setup.
We will break down the setup for client-side and server-side separately:
Before starting with the frontend setup for the image search engine, ensure you have the following prerequisites:
1. Node.js and npm: You’ll need to have Node.js and npm installed on your machine. You can download and install them from Node.js.
2. React: You’ll need basic knowledge of React and familiarity with creating and managing React components.
3. Axios: We will use Axios to make HTTP requests. Ensure you have Axis installed on your project.
Let’s install the required libraries:
1. Set up your React project: if you haven't already set up a React project, you can do so using Create React App.
npx create-react-app image-search
cd image-search
2. Install Axios: install Axios to make HTTP requests.
npm install axios
3. Validation: start your React application to see the template by using the following command.
npm start
Once started, the application will open in the browser on localhost
, displaying as follows:
We will return to this later to create the client side of the image search app. For now, let's set up the server side.
Before starting, ensure you have the following prerequisites in place for the server-side setup:
1. Node.js and npm: These prerequisites are required for the backend setup, but if you already installed them for the front end, you can skip this step.
2. PostgreSQL: Setting it up for the first time can be tedious and time-consuming, so we will use Timescale Cloud, which provides managed PostgreSQL database services.
3. Pgvector: This extension is installed in your PostgreSQL database to handle vector operations. Once connected to Timescale Cloud, you will need to enable the pgvector extension. We will cover this later. The npm package for pgvector will also be installed to cater to the types required to store embeddings in PostgreSQL.
4. Pg library: A PostgreSQL client for Node.js. Ensure it is installed along with the necessary PostgreSQL driver (pg
).
5. @xenova/transformers library: This converts text and image queries into embeddings using the CLIP model.
6. Cors: Cross-Origin Resource Sharing (CORS) is an HTTP header-based mechanism that allows a server to indicate any origins other than its own from which a browser should permit loading resources. To allow an origin to request a server, we need to install it.
Let’s install the required libraries:
1. Initialize your Node.js project to set up the server, create a folder, and run npm init
to initialize the project.
mkdir image-search-server
cd image-search-server
npm init -y
2. Install required packages: install the following packages required for the application.
npm install express pg pg-hstore @xenova/transformers pgvector cors
3. Create index.js
: create a file named index.js
and a basic Express server.
import express from 'express';
const app = express()
const port = 3000;
app.use(express.json());
app.get('/', (req, res) => res.send('Hello World!'))
app.listen(port, () => console.log(`Example app listening at http://localhost:${port}`))
The code sets up a basic Express.js server. It listens on port 3000 and responds with Hello World! when the root URL is accessed. The server uses the express.json()
middleware to parse JSON request bodies. Ensure that the type
field is set to "module"
in the package.json
file:
{
……
"version": "1.0.0",
"main": "index.js",
"type": "module"
……
}
4. Testing your API: start the server to test the API.
node index.js
We have our basic client and server side ready for the image application. Here is the project structure and file organization:
image-search-server/
│
├── index.js
├── database.js #Creation and Insertion of data logic
├── model.js #Generation of embeddings logic
├── node_modules/
│ └── (installed dependencies)
├── package.json
├── package-lock.json
image-search/
├── public/
│ ├── index.html
│ └── ...
├── src/
│ ├── assets/
│ │ └── 1.jpeg
│ ├── App.js
│ ├── index.js
│ └── ...
├── package.json
├── package-lock.json
└── README.md
It’s time to discuss the most important ingredient of the recipe: OpenAI’s CLIP model.
OpenAI’s CLIP model is open-source, eliminating the need for preliminary setup steps. However, most resources are in Python rather than JavaScript. The Xenova npm library, Transformers.js, addresses this gap. Transformers.js matches Hugging Face's Transformers Python library, enabling the same pre-trained models with a similar API. Supported tasks include the following:
Let's use the previously installed CLIP model. In this section, we will generate embeddings for a list of images using the CLIP model from xenova/transformers
and return them to the calling module. This process will assist in inserting the embeddings into our database. Later, we will also generate embeddings for text to aid in the application's search query.
First, let’s create a model.js
file.
These libraries handle image preprocessing, tokenization, model inference, and PostgreSQL vector operations.
#Model.js
import { AutoProcessor, AutoTokenizer, CLIPVisionModelWithProjection, CLIPTextModelWithProjection, RawImage } from '@xenova/transformers';
The models are defined outside the functions to load them only once and reuse them, enhancing performance.
const processorPromise = AutoProcessor.from_pretrained('Xenova/clip-vit-base-patch16');
const visionModelPromise = CLIPVisionModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
The code below defines an asynchronous function, visionEmbeddingGenerator
, that generates image embeddings from a given image file path. The function first waits for the processor
and visionModel
to be loaded. A processor prepares the image data by resizing, normalizing, and converting it into a format suitable for the model to generate embeddings.
It then reads the image using RawImage.read(image_path)
and processes it using the processor
. Next, it computes the embeddings by passing the processed image through the visionModel
. If any errors occur during these steps, they are caught and logged. Finally, the function returns the image embeddings as an array of data.
export async function visionEmbeddingGenerator(image_path){
const processor = await processorPromise
const visionModel = await visionModelPromise
try {
// Read image and run processor
const image = await RawImage.read(image_path);
const image_inputs = await processor(image);
// Compute embeddings
const { image_embeds } = await visionModel(image_inputs);
return image_embeds.data
} catch (err) {
console.error(`Error processing ${filePath}:`, err);
}
}
First, we will define the tokenizer and the model in the following way.
const tokenizerPromise = AutoTokenizer.from_pretrained('Xenova/clip-vit-base-patch16');
const textModelPromise = CLIPTextModelWithProjection.from_pretrained('Xenova/clip-vit-base-patch16');
The textEmbeddingGenerator
function processes a given text string to generate its embeddings using the CLIP model. First, it initializes the tokenizer and text model using previously defined promises. A tokenizer prepares the text data by breaking it into tokens, adding padding and truncating, if necessary, to ensure it is in the correct format for the model to generate embeddings.
export async function textEmbeddingGenerator(text){
const tokenizer = await tokenizerPromise;
const textModel = await textModelPromise;
const textInputs = tokenizer([text], { padding: true, truncation: true });
const { text_embeds } = await textModel(textInputs);
return text_embeds.data
}
The text is then tokenized with padding and truncation options. The tokenized inputs are injected into the text model to generate the embeddings.
The section above walked you through setting up a function in model.js
to generate image and text embeddings using the CLIP model.
In this section, we will set up, create, and insert data in the PostgreSQL database hosted on Timescale.
A critical component for an image search application is a vector database, which enables querying indexed image embeddings to retrieve the most relevant results. This tutorial will use PostgreSQL and its extension pgvector, self-hosted on Timescale, for efficient image search.
Here are a few reasons we recommend using Timescale for this:
To start, sign up for Timescale Cloud, create a new database, and follow the provided instructions. For more information, refer to the Get started with Timescale guide. Please take into account that the database creation might take a couple of minutes, make sure it appears as available in your 🔧 Services section before attempting to connect.
After signing up, connect to the Timescale database by providing the service URI, which can be found under the service section on the dashboard. The URI will look something like this:
postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require
To obtain the password, go to Project settings and click on Create credentials.
This setup guide will help you configure your environment for handling vector operations. Once connected to the Timescale Cloud, we need to enable the pgvector extension, which we will cover shortly.
To talk to the database, we will use pg
, a non-blocking PostgreSQL client for Node.js
.
In the index.js
, import the following, and initialize the pool. The pool will ensure that connections to the database are established to create a table, insert data, and perform queries later. One benefit of using a connection pool is improved performance by reusing existing connections instead of opening and closing new ones for each request.
#index.js
import express from 'express';
import pkg from 'pg'
const { Pool } = pkg
Now, to connect to the cloud PostgreSQL, use the credentials obtained from the previous step:
const app = express()
const port = 3000;
app.use(express.json());
app.get('/', (req, res) => res.send('Hello World!'))
app.listen(port, () => console.log(`Example app listening at http://localhost:${port}`))
const client = new Pool({
user: '<Your user>',
host: '<Your host>',
database: '<Your db>',
password: '<Your Password>',
port: <Your Port>,
ssl: {
rejectUnauthorized: false,
},
});
Ensure the connection with the client.connect()
in the following way:
client.connect()
.then(() => {
console.log('Connection to the database has been established successfully.');
})
.catch(err => {
console.error('Unable to connect to the database:', err);
});
After the connection, let’s create a table.
We will define the table creation logic in a separate file named database.js
. All database-related operations will use this file.
export async function createTableIfNotExists(client) {
await client.connect();
let tableCreated = false;
try {
// Ensure the pgvector extension is installed
await client.query('CREATE EXTENSION IF NOT EXISTS vector;');
// Check if the table exists
const checkTableExistsQuery = `
SELECT EXISTS (
SELECT 1
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_name = 'Search_table'
);
`;
const result = await client.query(checkTableExistsQuery);
const tableExists = result.rows[0].exists;
if (!tableExists) {
const documentTable = `
CREATE TABLE Search_table (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
path TEXT,
embedding vector(512)
);
`;
await client.query(documentTable);
console.log('Table created successfully');
tableCreated = true;
} else {
console.log('Table already exists');
}
} catch (err) {
console.error('Error creating table:', err);
} finally {
await client.release();
}
return tableCreated;
}
Here’s the explanation of the code above:
createTableIfNotExists
that takes a client
as an argument.client
.pgvector
extension is installed in the database by executing CREATE EXTENSION IF NOT EXISTS vector
.Search_table
exists in the public
schema. If it does not exist, this function creates it.Search_table
with the following columns:Release connection: The client.release()
method is used to return a database client back to the connection pool after it has been used.
Since this code will only be called once, we can invoke the function directly in database.js
. We can do that when inserting the data.
This section will discuss the dataset used for the image application and how to insert it into our database.
The Flickr30k dataset is a well-known benchmark for sentence-based image descriptions. It contains 31,783 images of people engaging in everyday activities and events. It is widely used for evaluating models that generate sentence-based portrayals of images. The dataset is available on Kaggle and can be easily downloaded. As this is an extensive image dataset, this demo is based on a sample of 100 images.
The following code is a part of database.js
:
import pgvector from 'pgvector/pg';
export async function insertInTable(client, filePaths) {
// Load processor and vision model
await client.connect();
await pgvector.registerTypes(client);
try {
for (const filePath of filePaths) {
try {
// Compute embeddings
const vision_embedding = await visionEmbeddingGenerator(filePath);
console.log(`Embeddings for ${filePath}:`, [pgvector.toSql(Array.from(vision_embedding))]);
await client.query('INSERT INTO Search_table (path, embedding) VALUES ($1, $2)', [
filePath,
pgvector.toSql(Array.from(vision_embedding)),
]);
}
catch (err) {
console.error(`Error processing ${filePath}:`, err);
}
}
}
finally {
await client.end();
}
}
The insertInTable
function connects to a PostgreSQL database and iterates over a list of image file paths. For each path, it computes image embeddings using the visionEmbeddingGenerator
function and inserts these embeddings, along with the file path, into the Search_table
table. It handles errors that occur while processing each image and ensures that the database connection is closed properly once all insertions are complete. This approach maintains robust error handling and efficient database management throughout the insertion process.
Let's include a function in utils.js
to list the files in our dataset directory. We will use this in database.js
to insert the images into the database. Here’s the utility function:
#utils.js
import fs from 'fs';
import path from 'path';
export function getFilePaths(directory) {
try {
const files = fs.readdirSync(directory);
const filePaths = files.map(file => path.join(directory, file));
return filePaths;
} catch (err) {
console.error('Error reading directory:', err);
return [];
}
}
Now we can import it in database.js
and execute the insertion:
import {getFilePaths}
from './utils.js' import pkg from 'pg';
import pkg from 'pg'
const { Pool } = pkg
function main() {
const client = new Pool({
user: '<Your user>',
host: '<Your host>',
database: '<Your db>',
password: '<Your Password>',
port: <Your Port>,
ssl: {
rejectUnauthorized: false,
},
});
const tableCreated = await createTableIfNotExists(client);
if (tableCreated) {
insertInTable(client, getFilePaths('dataset'))
}
}
main()
Note: The preceding code remains unchanged in the file.
Now that this process is complete, we have inserted the images and their embeddings in the table, which will be retrieved depending on the query.
In this section, we will develop a POST route /search
using Express.js
that accepts a textual query from the user, transforms it into embeddings, and performs a database search. CLIP, a neural network model, combines image and text embeddings into a unified output space, allowing for direct comparisons between the two modalities within a single model.
app.post('/search', async (req, res) => {
try {
// Load tokenizer and text model
await client.connect();
// Compute text embeddings
const text_emb = await textEmbeddingGenerator(req.query['searchText'])
const queryTextEmbedding = [pgvector.toSql(Array.from(text_emb))]
console.log(queryTextEmbedding)
// Perform similarity search
const result = await client.query(`
SELECT path FROM Search_table ORDER BY embedding <-> $1 LIMIT 5`, queryTextEmbedding);
res.json(result.rows);
console.log(result.rows)
} catch (error) {
console.error('Error performing search', error);
res.status(500).send('Error performing search');
}
finally{
client.end();
}
});
The app.post('/search')
route processes POST requests to perform an image search based on a textual query. When a request is received, the code first connects to the PostgreSQL database. It then generates embeddings for the search text using the `textEmbeddingGenerator` function.
These embeddings are converted into a format compatible with PostgreSQL using pgvector.toSql
. The route then executes a similarity search against the Search_table
table in the database, ordering results based on their similarity to the query embeddings using the <->
operator. It limits the results to the top five matches. The matching image paths are returned as a JSON response. If an error occurs during this process, a 500 status code is sent, and the database connection is closed in the finally
block.
After running the server using node index.js
, we can check our endpoint using Postman, which is a platform that helps developers build and use APIs. If that seems a hassle, we can simply use wget
or curl
. Here’s how we can make a POST request with curl
:
curl -X POST "http://localhost:3000/search" -d "searchText=old man"
If you are using Postman, you will need a desktop version. After logging in and creating a workspace, let’s request our API:
1. Add a query parameter with the key searchText
and the value old man
.
2. Configure the request method as POST.
3. Set the URL to http://localhost:3000
where the server is listening.
Here are the paths retrieved from the database after semantic search:
Let's verify one of the images from the paths to ensure that the retrieved images match the query.
Now, our server is ready to search, given the query. Let’s complete it with our client side.
In this section, we will create a React application that a client will use to interact with the Search API. Here’s how you can create the client side:
The first step is to create a component file named SearchBar.js
, which will take the user's input. Let’s write some code in it.
import React, { useState } from 'react';
import Timescale from './assets/1.jpeg'; # A Icon saved in the assets
const SearchBar = () => {
const [searchText, setSearchText] = useState('');
const [clicked, setClicked] = useState(false);
return (
<div className="container">
<div className="titleContainer">
<img src={Timescale} alt="logo" className="logo" />
<h1 className="title">Timescale Image Search Engine</h1>
</div>
<div className="searchContainer">
<input
type="text"
value={searchText}
onChange={(e) => setSearchText(e.target.value)}
placeholder="Search..."
className="input"
/>
<button onClick={() => setClicked(true)} className="button">
Search
</button>
</div>
</div>
);
};
export default SearchBar;
This React component, SearchBar
, allows users to input a search query and retrieve image results from a server. It manages the search text, results, loading state, and any errors encountered during the search. Let’s fill in with the useEffect
hook to query the Search API.
const [results, setResults] = useState([]);
const [error, setError] = useState(null);
useEffect(() => {
const performSearch = async () => {
try {
const response = await axios.post('http://localhost:3000/search', { searchText });
setResults(response.data);
} catch (err) {
setError(err);
} finally {
setLoading(false);
}
};
if (clicked) {
setClicked(false);
performSearch();
}
}, [clicked]);
This code snippet uses React's useState
and useEffect
hooks to manage search results and errors. When the clicked
state changes, useEffect
triggers an asynchronous search function that sends a POST request to http://localhost:3000/search
with the search text. Successful responses update the `results` state, and any errors update the error
state. The clicked
state is reset to prevent repeated searches.
Now, let’s look at the complete SearchBar
component. Please note that additional components and custom hooks have been created to handle dynamic image imports. However, due to the scope of the article, we will skip the explanation. If you want, you can explore this further in our GitHub repository.
Here’s the complete component:
#SearchBar.js
import React, { useEffect, useState } from 'react';
import Timescale from './assets/1.jpeg';
import axios from 'axios';
import Image from './Image';
const SearchBar = () => {
const [searchText, setSearchText] = useState('');
const [results, setResults] = useState([]);
const [error, setError] = useState(null);
const [clicked, setClicked] = useState(false);
useEffect(() => {
const performSearch = async () => {
try {
const response = await axios.post('http://localhost:3000/search', { searchText });
setResults(response.data);
} catch (err) {
setError(err);
} finally {
setLoading(false);
}
};
if (clicked) {
setClicked(false);
performSearch();
}
}, [clicked]);
return (
<div style={styles.container}>
<div style={styles.titleContainer}>
<img src={Timescale} alt="logo" style={styles.logo} />
<h1 style={styles.title}>Timescale Image Search Engine</h1>
</div>
<div style={styles.searchContainer}>
<input
type="text"
value={searchText}
onChange={(e) => setSearchText(e.target.value)}
placeholder="Search..."
style={styles.input}
/>
<button onClick={() => setClicked(true)} style={styles.button}>
Search
</button>
</div>
<div style={styles.resultsContainer}>
{results.length > 0 && (
<ul style={styles.resultsList}>
{results.map((item, index) => (
<li key={index} style={styles.resultItem}>
<Image fileName={item.path} alt={searchText} />
</li>
))}
</ul>
)}
</div>
</div>
);
};
const styles = {
container: {
display: 'flex',
flexDirection: 'column',
alignItems: 'center',
justifyContent: 'center',
backgroundColor: 'black',
textAlign: 'center',
padding: '20px',
},
titleContainer: {
display: 'flex',
alignItems: 'center',
marginBottom: '20px',
},
logo: {
width: '80px',
height: '80px',
marginRight: '10px',
},
title: {
fontSize: '48px',
color: '#F5FF80',
},
searchContainer: {
display: 'flex',
alignItems: 'center',
justifyContent: 'center',
width: '100%',
marginBottom: '20px',
},
input: {
padding: '15px',
borderRadius: '5px',
border: '1px solid #F5FF80',
marginRight: '10px',
width: '50%',
},
button: {
padding: '15px 15px',
borderRadius: '5px',
border: 'none',
backgroundColor: '#F5FF80',
color: 'black',
cursor: 'pointer',
},
resultsContainer: {
width: '100%',
textAlign: 'center', // Center align the results container
},
resultsList: {
listStyleType: 'none',
padding: 0,
margin: 0,
display: 'flex',
flexWrap: 'wrap',
justifyContent: 'center',
},
resultItem: {
margin: '10px',
color: '#F5FF80',
textAlign: 'center',
},
};
export default SearchBar;
The SearchBar.js
component is also responsible for displaying images on the page. After retrieving the image paths from the database, it selects the corresponding assets and displays them. To dynamically add image imports in React, we have created the useImage
effect and the Image
component.
# useImage.js
import { useEffect, useState } from 'react'
const useImage = (fileName) => {
const [loading, setLoading] = useState(true)
const [error, setError] = useState(null)
const [image, setImage] = useState(null)
useEffect(() => {
const fetchImage = async () => {
const path = fileName.replace(/\\/g, '/');
try {
const response = await import(`./assets/${path}`) // change relative path to suit your needs
setImage(response.default)
} catch (err) {
setError(err)
} finally {
setLoading(false)
}
}
fetchImage()
}, [fileName])
return {
loading,
error,
image,
}
}
export default useImage
Note: An assets
folder is created within the src
directory, which contains the image dataset.
Let’s create a component to display the image, as you can see in the SearchBar.js
:
#Image.js
import useImage from "./useImage"
const Image = ({ fileName, alt }) => {
const { loading, error, image } = useImage(fileName)
console.log(error)
return (
<>
<img
src={image}
alt={alt}
/>
</>
)
}
export default Image
The exported component can be imported into App.js
, which is the main file of the React application. Here’s how to import it:
import './App.css';
import SearchBar from './SearchBar';
function App() {
return (
<div className="App">
<SearchBar/>
</div>
);
}
export default App;
With all of the coding complete, here’s what everything should look like. Let’s see if we can find cyclists:
Hooray! 🎉 We have our image search engine ready to be used. At present, it is configured to handle 100 images efficiently. However, there is substantial room for improvement. We can upload the entire dataset by leveraging Timescale-hosted PostgreSQL. It offers robust and scalable solutions, ensuring that our search engine will perform optimally as we scale up.
In this blog post, we explored how to build an image search application using OpenAI CLIP and managed PostgreSQL. We covered setting up the PostgreSQL database with pgvector, creating functions to generate and store embeddings, and building a React front end to query and display search results. PostgreSQL is all you need to build your AI applications.
We encourage you to try building your own image search engine—take a look at our GitHub repository for guidance—using Timescale Cloud’s managed PostgreSQL platform and open-source AI stack (pgvector, pgai, and pgvectorscale). It unlocks quicker and more accurate similarity searches on large-scale vectors and lightning-fast time-based vector searches with much welcome SQL familiarity. Create a Timescale Cloud account and start for free today.
You can access pgai and pgvectorscale on any database service on the Timescale Cloud PostgreSQL platform or install them manually according to the instructions in the pgai and pgvectorscale GitHub repos (⭐s welcome!).