Partition by additional dimension and query is 10-times slower instead of expected to run much faster

ts101 · July 30, 2024, 6:47am

On PostgreSQL 15.6 and TimescaleDB 2.14.2 self-host on prem.

TEST 1: Hyper-table partitioned by time

Create PostgreSQL table.
Convert table to hypertable with: SELECT CREATE_HYPERTABLE('mytable', BY_RANGE('time', INTERVAL '3 day'));
Insert the data into a table and create index: CREATE UNIQUE INDEX myindex ON mytable (deviceid, companyid, time);
Execute query it takes: 300 ms (when no data are in PostgreSQL cache).
Execute the query again it takes: 51 ms (data are in PostgreSQL cache).
Did an SQL explain and at the top of explain is: “Chunks excluded during startup: 190” and with total 360 chunks that is more then half of chunks where excluded. This is just like expected.

TEST 2: Hyper-table partitioned by time and company
Now I though adding additional dimension would speed up the querys.
1 and 2 The same as above.
2a SELECT ADD_DIMENSION('mytable', BY_RANGE('company', 1)); → Each company data in its own chunk.
3,4,5,6 The same steps as above.

At 4) But to my a surprise in step 4 query takes 4 seconds, which is 10-times comparing to previous test.
At 5) When SQL is run for the second time (data are in PostgreSQL cache) it still executes 150 ms, which is 3-times above test.
At 6) I did explain and at the top of explain is: “Chunks excluded during startup: 1”. I expected to see at least 70% of chunks should be excluded during startup.

Like I see the culprit of the problem is, that database does not exclude 70% of chunks during startup.

QUESTION: Is there something additionally I need to set, that database eliminates 70% of chunks chunks during startup?

EDIT: I checked the catalog and here are more info:

select id, hypertable_id, column_name, column_type, interval_length from _timescaledb_catalog.dimension where hypertable_id = 37;

 id | hypertable_id |  column_name   |       column_type        | interval_length                                                           
----+---------------+----------------+--------------------------+-----------------                                                          
 32 |            37 | time           | timestamp with time zone |    259200000000                                                           
 33 |            37 | company        | smallint                 |               1                                                           
(2 rows)

select * from _timescaledb_catalog.dimension_slice where dimension_id = 33;

  id   | dimension_id | range_start | range_end                                                                                             
-------+--------------+-------------+-----------                                                                                            
 34539 |           33 |           2 |         3                                                                                             
 34541 |           33 |           3 |         4                                                                                             
 34543 |           33 |           4 |         5                                                                                             
 34544 |           33 |           6 |         7                                                                                             
 34542 |           33 |           7 |         8                                                                                             
 34741 |           33 |          11 |        12                                                                                             
(6 rows)

jonatasdp · August 12, 2024, 2:52pm

There’s a misusage of the by_range latest argument, it seems you intent to put a single dimension will generate an extra dimension 1:n depending on the number of companies you have, but this refers to the maximum amount of partitions. A value greater than one, like 100, representing 100 buckets of companies, which if you put it into one of the chunks.

In the end you just have more metadata to query and depending on the simplicity of the business the secondary dimension will not help you at all. As you see, we don’t recommend it by default for all the users. If your database will always be queried by user, you can also use a schema to separate user data and have hypertables per schema.

ts101 · September 3, 2024, 12:05pm

What is the point of partitioning? If having one single huge physical table then query has to go through whole table (or whole huge index) to retrieve data. What is done by partitioning is one huge logical table is divided into many hundreds of chunks and when SQL is appropriately written that refers to specific time (if partitioned by time column) then database engine knows which chunks to completely eliminate and execute query only on chunks that satisfy time query condition. This elimination of chunks can be observed by explain, works as designed.

We have five companies and each of query has strict reference to company eg. “where companyid = 1 and timeid between…”. I used by_range function with parameter “1” and Timescaledb correctly created new chunk for every company (actually company and time combination). Now I expect timescale is aware of this additional partitioning and eliminate chunks that do not satisfy company condition. But it does NOT…

OK, lets get step back. Lets suppose for sake of argument instead of only 5 companies we have 1000 companies. Now adding additional dimension with by_range with e.g. “parameter 100” and 100 companies are stored on single chunk (in the same time interval). Now my question is, will Timescaledb take into consideration chunk elimination, because it know exactly which 100 companies are stored chunk by chunks?

Maybe I misunderstands additional dimensioning, is there some lets say perfect scenario when to use it? Maybe from perfect scenario I may comprehend the intention of additional dimensioning.

P.S. I know we can create new hyper-table per company, but in this stage of development it is just not possible from business plans objectives.

jonatasdp · September 5, 2024, 1:49pm

That’s all great feedback @ts101, let’s see if we can bring @matski to this conversation as he also worked on the latest dimensions and can speak more closer to you about it.

For the companyid=1 as your extra dimension, please, bring a POC for the team as a github issue, that will be a great value for the team.