Performance question

Martin_Hollender · August 12, 2022, 9:42am

Please admit my very naive question. I do not have much postgres know-how.
My setup: postgresql-14.3-1-windows-x64, latest timescaledb (both default installation)
VM Win Server 2019, 8 cores, 16GB RAM, 3,5TB SDD
I have 5 years of data, 8000 variables put into a single table timestamp(pk), signalid(pk), value
This results in appr. 20.000.000.000 entries, 1,5TB on disk
vacuum full can take up to 18hours
I have the same data on a similar machine with a proprietary Historian which responds 80 times faster than queries against postgres.
My feeling is that I’m doing something terribly wrong. Any hints what should be done to get better performance?
Then I’m trying to create a hypertable like this:
LOCK TABLE signal IN SHARE ROW EXCLUSIVE MODE;
SELECT create_hypertable(‘rn’, ‘timestamp’, chunk_time_interval => interval ‘1 week’, if_not_exists => true,migrate_data => true);
This freezes after a while, it doesn’t work

LorraineP · August 12, 2022, 11:30am

Hello Martin, welcome to the forum and thank you for the question. Others in the team and wider community may have some better suggestions than me, but they’ll almost certainly ask you to share the EXPLAIN ANALYZE for the queries at least. We often recommend using Depesz’s Explain for this because it formats the output in a way that’s much easier to understand. If you could maybe get the output from a sample “bad” query we’ll almost certainly be able to help with some pointers.

It may be Monday before one of our developer advocates gets to this question but bear with us. Let’s see if we can get you on the right track. Meanwhile, here are a couple of links in case they inspire you:

Foundations videos(especially lesson 7)
SQL query filtering best practices

nando · August 12, 2022, 11:48am

Hi @Martin_Hollender,

I’m a newby but I think that, if you haven’t been able to create the hypertables you aren’t taking advantage of features like partitioning, columnar storage and other improvements timescaledb provides, so, I believe it’s normal that your database has so bad performance at the present.

Did you try to create the hypertables before populating them? I had a similar issue while trying to use the ‘migrate_data=>True’ with huge tables.

Best regards, Fernando.

Martin_Hollender · August 12, 2022, 12:05pm

Hi Fernando,
thanks for your answer. My problem with filling an empty hypertable is that I’m not allowed to add old data (older than 1 week) Compressed data seems to be locked for writing.
Kind regards
Martin

ryanbooz · August 15, 2022, 8:28pm

@Martin_Hollender,

Once a chunk is compressed, you can insert into it, but not update or delete. Still, for your type of conversion, you do need to take a different approach which isn’t currently accommodated with the migrate_data option.

Can you ensure that data could be inserted into an empty hypertable from the historian in time order (preferably oldest to newest)? If so, I can provide some guidance/scripts on how to setup the hypertable without setting up a compression policy (initially). Instead, you can setup a User-Defined Action that compresses chunks over time once you know that historical data is finished being added to a chunk.

This is similar to how I approached adding lots of historical data in this video (it’s old, so keep that in mind).

Martin_Hollender · August 16, 2022, 1:59pm

Thank you very much Ryan, your approach worked!
Query performance is now comparable to the proprietary Historian
When I now try to add more data I get this response:
ERROR: insert into a compressed chunk that has primary or unique constraint is not supported

ryanbooz · August 16, 2022, 2:56pm

I’d need to know a bit more about your schema and compression settings, but generally, yes, this is not allowed.

When you compress chunks in TimescaleDB, the indexes you have on uncompressed chunks (your normal indexes) don’t exist on compressed chunks because the data has been transformed into columns (and arrays of up to 1,000 values at a time) - so there is typically no way to verify a unique value any longer on those chunks (PostgreSQL uses the indexes to verify the constraints).

This also makes sense when you consider how TimescaleDB does compression. If you use a unique key as one of the segmentby columns, then you essentially get no compression because each row would be unique by segment. Compression works best when we can group many rows (up to 1,000) into one compressed row.

Is the unique constraint actually necessary or is this something like and ID column added by convention/ORM?