PostgreSQL

Nov 28, 2024

Boosting Postgres INSERT Performance by 2x With UNNEST

Hi, I'm James. I work in Developer Advocacy at Timescale. Recently, I’ve been diving deep into performance topics and thinking about the kind of content I enjoy reading and writing. This is a new series that will focus on using data visualizations to spotlight single concepts in Timescale or PostgreSQL performance. No marketing fluff, no unrealistic benchmarks, just straightforward explorations of performance topics I find interesting, with some notes on why I think they matter.

Boosting Postgres INSERT Performance by 2x With UNNEST

If you Google Postgres INSERT performance for long enough, you’ll find some hushed mentions of using an arcane UNNEST function (if you squint, it looks like a columnar insert) over a series of arrays to increase performance. Any performance gains sound good to me, but what's actually going on here?

I’ve been aware of this technique for a long time (in fact, several object-relational mappers use it under the hood), but I’ve never fully understood what's happening, and any analysis I’ve seen has always left me wondering if the gains were as much about data wrangling in the programming language used as Postgres speed. This week I decided to change that and do some testing myself.

💡
This used to be called "Boosting Postgres INSERT Performance by 50% with UNNEST". The performance went from 2.19s to 1.03s, which 52.97% less time, but also 113% faster.

I changed the wording in this article to 2x because I think that's always clearer (thanks /u/a3kov and /u/lobster_johnson)

The Introduction: INSERTs in Postgres

At Timescale, I work with time-series data, so I gave my analysis a time-series slant. I want to simulate inserting a stream of records into my database with the INSERT statement (yes, I know COPY is a thing, see the callout below), and in doing so, I want to minimize the load I create as much as possible (saving my precious CPU cycles for my real-time analytics queries).

💡
If you’re aiming to load data into your database as quickly and efficiently as possible, check out the PostgreSQL COPY command—it’s almost always faster than using regular INSERT. We benchmarked Postgres data ingestion methods in an earlier post.

However, even though COPY is faster, many developers still prefer INSERT for its flexibility. INSERT supports useful features like upserts (INSERT ... ON CONFLICT), returning the inserted rows, and has better integration with language libraries. Plus, it can be part of a larger SQL query, giving you more control over the data insertion process.


Let’s take a closer look at the INSERT queries I tested using a batch size of 1,000, 5,000, and 10,000 records.

In one corner, we have the multi-record INSERT variant we all know and love, using a VALUES clause followed by a tuple per row in the batch. These queries look long but also pretty easy to understand.

INSERT INTO sensors (sensorid, ts, value)
VALUES 
  ($1, $2, $3), 
  ($4, $5, $6), 
   ..., 
  ($2998, $2999, $3000);

In the other corner, we have our UNNEST variant, using a SELECT query that takes one array per column and uses the UNNEST function to convert them into rows at execution time.

INSERT INTO sensors (ts, sensorid, value) 
  SELECT * 
  FROM unnest(
    $1::timestamptz[], 
    $2::text[], 
    $3::float8[]
)

The Postgres documentation describes UNNEST as a function that “expands multiple arrays (possibly of different data types) into a set of rows.” This actually makes sense, it’s basically flattening a series of arrays into a row set, much like the one in INSERT .. VALUES query.

One key difference is that where the first variant has batch_size * num_columns values in the query, the UNNEST variant only has num_columns arrays (each of which contains batch_size records when it’s flattened). This will be important later, so take note!

The Setup 

I ran the benchmark on a single Timescale 4 CPU/16 GB memory instance (the spec isn't really important for this benchmark) with a very simple schema (the same table I used on the SkipScan performance post).

CREATE TABLE sensors (
    sensorid TEXT,
    ts TIMESTAMPTZ,
    value FLOAT8
);

I was hoping to use Grafana k6 for all my performance articles, but in this case, it didn’t make sense. I don’t want to measure the time that application code takes to get my data into the format an INSERT .. VALUES or INSERT .. UNNEST statement needs (especially in TypeScript), I just want the time the database spends processing the statements and loading my data.

I fell back to using good old pgbench for these tests with a static file for each INSERT variant and batch combination. As usual, you can find the files in the timescale/performance GitHub repo.

I ran each of the following queries to insert one million records using a single thread:

  • INSERT .. VALUES with a batch size of 1,000
  • INSERT .. VALUES with a batch size of 5,000
  • INSERT .. VALUES with a batch size of 1,0000
  • INSERT .. UNNEST with a batch size of 1,000
  • INSERT .. UNNEST with a batch size of 5,000
  • INSERT .. UNNEST with a batch size of 10,000

I used the pg_stat_statments (if you don’t know about this amazing extension, then do yourself a favor and look it up!) statistics in the database to extract the total _planning_time and total_exec_time for each run.

The Results: INSERT VALUES vs. INSERT UNNEST

The results were very clear: at the database layer, INSERT .. UNNEST is 2.13x faster than INSERT .. VALUES at at batch size of 1000! This ratio held steady regardless of batch size (and even with multiple parallel jobs).

  • The primary savings come at query planning time. With the INSERT .. VALUES approach, Postgres must parse and plan each value individually (remember how many there were?). In contrast, INSERT .. UNNEST processes one array per column, which reduces the planning workload by not working with individual elements at plan time.
  • Execution time is similar between both methods. The actual query execution was time slightly slower for UNNEST, which reflects the extra work that the UNNEST function needs to do. This was more than made up for by the planning gain.

As you might expect adding columns makes things even better for UNNEST, with 10 float columns (rather than one) we get a massive 5.02x faster So if you've got a wide schema, you're in for even more performance gains (but I wanted to leave this article at what most people could reasonably expect).

If you’d like to see the graphs for the 5,000 and 10,000 batch sizes, then check out the PopSQL dashboard.

A reasonable response to this might be, "What if we prepared the INSERT .. VALUES query, would that reduce planning time and make it the winner?". Some quick tests (unfortunately, pg_stat_statements can't track statistics for EXECTUTE queries on prepared statements) show that this is not the case; UNNEST is still king.

Should I use UNNEST?

There’s no question that in terms of database performance, INSERT .. UNNEST beats INSERT .. VALUES for batch inserts. By minimizing planning overhead, UNNEST unlocks an almost magical speed boost, making it a fantastic option for scenarios where ingestion speed is critical. One thing to keep in mind is that the overhead of your language and network latency often contribute just as much to the total time you see in your application, but still, your database will be working less, which is always a good thing.

As with any optimization, there’s a trade-off. The key consideration isn’t always just speed; it’s also usability. The INSERT .. VALUES syntax is intuitive and widely understood, making it easier to adopt and maintain, especially in teams or projects where SQL expertise varies. Pivoting to use UNNEST introduces complexity. You’ll need to wrangle your data into arrays, and if you’re using an ORM, you might discover it doesn’t support this pattern at all. If you're writing raw SQL, UNNEST might be less familiar to future developers inheriting your codebase.

And while UNNEST is fast, let’s not forget about COPY, which remains the undisputed gold standard for ingestion. If you don’t need features like upserts (ON CONFLICT clauses), COPY will get your data in faster, and with less overhead.

Final Thoughts for Developers

Think of INSERT .. UNNEST as a magic performance hack sitting squarely between traditional INSERT .. VALUES and COPY. It delivers significant speed improvements for batch ingestion while retaining the flexibility and composability of SQL INSERT statements.

At Timescale, we love exploring the edges of what Postgres can do and techniques like INSERT .. UNNEST remind us why. It’s elegant, fast, and underutilized, but hopefully no longer misunderstood. If you’re aiming to push your database to its limits, we highly recommend adding this pattern to your SQL toolkit. It’s another example of how understanding Postgres deeply can help you get the most out of your system. And if you want to optimize your PostgreSQL database for time series, events, real-time analytics, or vector data, take TimescaleDB out for a spin.

Originally posted

Nov 15, 2024

Last updated

Nov 28, 2024

Share

Subscribe to the Timescale Newsletter

By submitting you acknowledge Timescale's Privacy Policy.