Data Matching

PostgreSQL Extensions: Install pg_trgm for Data Matching

Introduction to the pg_trgm PostgreSQL Extension

pg_trgm is a PostgreSQL extension that provides functions and operators for determining the similarity of alphanumeric text based on trigram matching. Trigrams are a set of three consecutive characters taken from a string.  The pg_trgm extension can be used to measure the number of trigrams shared by two strings, which can be used to calculate their level of similarity. This is particularly useful for developers implementing features such as search functionality or data matching in their applications.

pg_trgm Install

To install the pg_trgm extension, you need to have access to a PostgreSQL database. Follow the steps below:

1. Connect to your PostgreSQL database.

2. Load it into your shared_preloaded_libraries in postgresql.conf, as it will require additional shared memory. If you’re using Timescale, you can simply skip this step, as the extension is already preloaded. You can find available extensions by going to Operations > Extensions from your service overview, which will also give you installation instructions.

3. Run the following SQL command:

CREATE EXTENSION pg_trgm;

This command will install the pg_trgm extension in your current database. If you want to install it in a different database, you need to connect to that database first.

Using the pg_trgm Extension

Once the pg_trgm extension is installed, you can use it to compare strings for similarity. Here are some examples of how to use the pg_trgm functions:

  • To calculate the similarity between two strings, use the similarity(text, text) function. This function returns a number between 0 and 1, where 1 means the strings are identical and 0 means they have no trigrams in common.

  • To find records in a table that are similar to a given string, use the % operator. For example, the following query returns all records in the products table where the name is similar to 'apple':

SELECT * FROM products WHERE name % 'apple';

Time-Series Use Cases for the pg_trgm Extension

The pg_trgm extension can be used in a variety of time-series use cases. For example, it can be used to find similar time-series data based on their labels or metadata. This can be useful in scenarios where you want to find patterns or anomalies in your data.

Using pg_trgm Extension With Timescale and Time-Series Data

If you're using Timescale, a time-series cloud database built on top of PostgreSQL, you can use the pg_trgm extension to enhance your time-series data analysis

For example, you can use the pg_trgm functions to find similar time series based on their metadata or to search for specific time series based on their labels. To do this, you would use the pg_trgm functions in the same way as described in section three of this article, but on your time-series data instead.

Learn how to simplify your time-series analysis with only a few functions.