Please admit my very naive question. I do not have much postgres know-how.
My setup: postgresql-14.3-1-windows-x64, latest timescaledb (both default installation)
VM Win Server 2019, 8 cores, 16GB RAM, 3,5TB SDD
I have 5 years of data, 8000 variables put into a single table timestamp(pk), signalid(pk), value
This results in appr. 20.000.000.000 entries, 1,5TB on disk
vacuum full can take up to 18hours
I have the same data on a similar machine with a proprietary Historian which responds 80 times faster than queries against postgres.
My feeling is that Iām doing something terribly wrong. Any hints what should be done to get better performance?
Then Iām trying to create a hypertable like this:
LOCK TABLE signal IN SHARE ROW EXCLUSIVE MODE;
SELECT create_hypertable(ārnā, ātimestampā, chunk_time_interval => interval ā1 weekā, if_not_exists => true,migrate_data => true);
This freezes after a while, it doesnāt work
Hello Martin, welcome to the forum and thank you for the question. Others in the team and wider community may have some better suggestions than me, but theyāll almost certainly ask you to share the EXPLAIN ANALYZE for the queries at least. We often recommend using Depeszās Explain for this because it formats the output in a way thatās much easier to understand. If you could maybe get the output from a sample ābadā query weāll almost certainly be able to help with some pointers.
It may be Monday before one of our developer advocates gets to this question but bear with us. Letās see if we can get you on the right track. Meanwhile, here are a couple of links in case they inspire you:
- Foundations videos(especially lesson 7)
- SQL query filtering best practices
Iām a newby but I think that, if you havenāt been able to create the hypertables you arenāt taking advantage of features like partitioning, columnar storage and other improvements timescaledb provides, so, I believe itās normal that your database has so bad performance at the present.
Did you try to create the hypertables before populating them? I had a similar issue while trying to use the āmigrate_data=>Trueā with huge tables.
Best regards, Fernando.
Hi Fernando,
thanks for your answer. My problem with filling an empty hypertable is that Iām not allowed to add old data (older than 1 week) Compressed data seems to be locked for writing.
Kind regards
Martin
Once a chunk is compressed, you can insert into it, but not update or delete. Still, for your type of conversion, you do need to take a different approach which isnāt currently accommodated with the migrate_data
option.
Can you ensure that data could be inserted into an empty hypertable from the historian in time order (preferably oldest to newest)? If so, I can provide some guidance/scripts on how to setup the hypertable without setting up a compression policy (initially). Instead, you can setup a User-Defined Action that compresses chunks over time once you know that historical data is finished being added to a chunk.
This is similar to how I approached adding lots of historical data in this video (itās old, so keep that in mind).
Thank you very much Ryan, your approach worked!
Query performance is now comparable to the proprietary Historian
When I now try to add more data I get this response:
ERROR: insert into a compressed chunk that has primary or unique constraint is not supported
Iād need to know a bit more about your schema and compression settings, but generally, yes, this is not allowed.
When you compress chunks in TimescaleDB, the indexes you have on uncompressed chunks (your normal indexes) donāt exist on compressed chunks because the data has been transformed into columns (and arrays of up to 1,000 values at a time) - so there is typically no way to verify a unique value any longer on those chunks (PostgreSQL uses the indexes to verify the constraints).
This also makes sense when you consider how TimescaleDB does compression. If you use a unique key as one of the segmentby
columns, then you essentially get no compression because each row would be unique by segment. Compression works best when we can group many rows (up to 1,000) into one compressed row.
Is the unique constraint actually necessary or is this something like and ID
column added by convention/ORM?