Database size growing unexpectedly after an application aggregation added

I have a system that is reliably writing time series data to tables and have recently added an ‘aggregation’ that is computed in the application layer (because there is logic and data from this processing that is required to do the computation). I essentially add a column to the table with:

ALTER TABLE <tableName>
ADD COLUMN IF NOT EXISTS <newcolumnname> DOUBLE PRECISION NULL";

I later then update that column in distinct logical subsets of the table data with:

UPDATE <tablename>
SET <newcolumnname>
CASE time
WHEN <timevalue1> THEN <value1>
WHEN <timevalue2> THEN <value2>
...
ELSE <newcolumnname>;

The tables I have are not large (30k-100k rows) but after adding this new column and its data, the database size on disk is becoming much much larger than I think it should for the column I’m adding (note there are multiple tables of this type, so its compounded by that as well)

I’ve attempted to run VACUUM/VACUUM FULL with pgAdmin but it does not seem to reclaim any space.

Is there another process I can run to determine where the bloat is occurring, or perhaps better SQL to use to add values to the new column that doesn’t have the ill effects (if indeed, this is what introduced the bloat).

TIA

Hi @leisurehound, how often are you updating this column?

Every update generates a new row in the database and discards the old row. In the case of pages.

That will basically duplicate the rows in the database and you’ll need to deal with the dead tuples.

Check the dead tuples:

SELECT schemaname, relname, n_live_tup, n_dead_tup, round(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY dead_ratio DESC;

Also, depending on the case, just creating a new table with the new structure will be faster and cause less work:

CREATE TABLE <new_tablename> AS SELECT * FROM <tablename>;
DROP TABLE <tablename>;
ALTER TABLE <new_tablename> RENAME TO <tablename>;

It depends, the system gets 10hz data at bursty time frames, from as little as 10 minutes to as much as 3hrs, and then is essentially idle in between these events for large portions of the day. The event that fires off the aggregation can occur up to ever 25s to every 2m, depending on the nature of the event during the bursts.

So, are you suggesting that after I add the values to the column I rename the table then? Am not sure how the SQL above will both remove the dead elements and add the values I’m trying to add to the individual rows.

Thanks for your reply!

My comprehension is that you’ll need to build a custom script that backfill the values is faster and will generate a clean table with the plain inserts and all the columns. Renaming would be a suggestion to just keep the original name.