Slow select from compressed hypertable by identifier from PK

LuxCore · April 3, 2024, 12:30pm

Hi everybody!

Once upon a time I compress hypertables. And it was the end… of the lightning fast select of 2+ id values which corresponds to PK. The end of story )

Question: how should I write queries using condition with id IN (123, 456[, ...]) so that they work out quickly?

Example:

-- DROP TABLE hypertable;

CREATE TABLE hypertable(id int, type_id int, ts timestamp, PRIMARY KEY(id, ts));

SELECT create_hypertable(relation => 'hypertable', dimension => by_range('ts', INTERVAL '1 day'));

ALTER TABLE hypertable
SET (
	timescaledb.compress,
	timescaledb.compress_orderby = 'id',
	timescaledb.compress_segmentby = 'type_id'
);

DO $$
BEGIN
	FOR i IN 0..4 LOOP
		INSERT INTO hypertable(id, type_id, ts)
		SELECT gs.id, floor(random() * 3 + 1), TIMESTAMP '2023-12-31' + make_interval(days => (i + 1))
		FROM generate_series(i * 110000 + 1, 110000 * (i + 1), 1) AS gs(id);
	END LOOP;
END;$$;

SELECT compress_chunk(c.chunk_schema || '.' || c.chunk_name)
FROM timescaledb_information.chunks AS c
WHERE hypertable_name = 'hypertable';

EXPLAIN ANALYZE VERBOSE
SELECT * FROM hypertable WHERE id IN (3, 330050);

EXPLAIN ANALYZE VERBOSE
SELECT * FROM hypertable WHERE id = 330050;

Approprate query plans
1.

QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Append  (cost=0.02..1790.96 rows=3465000 width=16) (actual time=0.026..3.636 rows=2 loops=1)                                                                                                                                                                   |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40701_chunk  (cost=0.02..11.69 rows=693000 width=16) (actual time=0.026..0.716 rows=1 loops=1)                                                                                         |
        Output: _hyper_535_40701_chunk.id, _hyper_535_40701_chunk.type_id, _hyper_535_40701_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40701_chunk.id = ANY ('{3,330050}'::integer[]))                                                                                                                                                                         |
        Rows Removed by Filter: 109999                                                                                                                                                                                                                         |
        Batches Removed by Filter: 110                                                                                                                                                                                                                         |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40711_chunk  (cost=0.00..11.69 rows=693 width=96) (actual time=0.006..0.048 rows=111 loops=1)                                                                                                 |
              Output: compress_hyper_536_40711_chunk.id, compress_hyper_536_40711_chunk.type_id, compress_hyper_536_40711_chunk.ts, compress_hyper_536_40711_chunk._ts_meta_count, compress_hyper_536_40711_chunk._ts_meta_sequence_num, compress_hyper_536_407|
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40702_chunk  (cost=0.02..11.69 rows=693000 width=16) (actual time=0.761..0.761 rows=0 loops=1)                                                                                         |
        Output: _hyper_535_40702_chunk.id, _hyper_535_40702_chunk.type_id, _hyper_535_40702_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40702_chunk.id = ANY ('{3,330050}'::integer[]))                                                                                                                                                                         |
        Rows Removed by Filter: 110000                                                                                                                                                                                                                         |
        Batches Removed by Filter: 111                                                                                                                                                                                                                         |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40712_chunk  (cost=0.00..11.69 rows=693 width=96) (actual time=0.003..0.026 rows=111 loops=1)                                                                                                 |
              Output: compress_hyper_536_40712_chunk.id, compress_hyper_536_40712_chunk.type_id, compress_hyper_536_40712_chunk.ts, compress_hyper_536_40712_chunk._ts_meta_count, compress_hyper_536_40712_chunk._ts_meta_sequence_num, compress_hyper_536_407|
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40703_chunk  (cost=0.02..11.69 rows=693000 width=16) (actual time=0.761..0.761 rows=0 loops=1)                                                                                         |
        Output: _hyper_535_40703_chunk.id, _hyper_535_40703_chunk.type_id, _hyper_535_40703_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40703_chunk.id = ANY ('{3,330050}'::integer[]))                                                                                                                                                                         |
        Rows Removed by Filter: 110000                                                                                                                                                                                                                         |
        Batches Removed by Filter: 111                                                                                                                                                                                                                         |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40713_chunk  (cost=0.00..11.69 rows=693 width=96) (actual time=0.003..0.027 rows=111 loops=1)                                                                                                 |
              Output: compress_hyper_536_40713_chunk.id, compress_hyper_536_40713_chunk.type_id, compress_hyper_536_40713_chunk.ts, compress_hyper_536_40713_chunk._ts_meta_count, compress_hyper_536_40713_chunk._ts_meta_sequence_num, compress_hyper_536_407|
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40704_chunk  (cost=0.02..11.69 rows=693000 width=16) (actual time=0.016..0.744 rows=1 loops=1)                                                                                         |
        Output: _hyper_535_40704_chunk.id, _hyper_535_40704_chunk.type_id, _hyper_535_40704_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40704_chunk.id = ANY ('{3,330050}'::integer[]))                                                                                                                                                                         |
        Rows Removed by Filter: 109999                                                                                                                                                                                                                         |
        Batches Removed by Filter: 110                                                                                                                                                                                                                         |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40714_chunk  (cost=0.00..11.69 rows=693 width=96) (actual time=0.004..0.031 rows=111 loops=1)                                                                                                 |
              Output: compress_hyper_536_40714_chunk.id, compress_hyper_536_40714_chunk.type_id, compress_hyper_536_40714_chunk.ts, compress_hyper_536_40714_chunk._ts_meta_count, compress_hyper_536_40714_chunk._ts_meta_sequence_num, compress_hyper_536_407|
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40705_chunk  (cost=0.02..11.69 rows=693000 width=16) (actual time=0.650..0.650 rows=0 loops=1)                                                                                         |
        Output: _hyper_535_40705_chunk.id, _hyper_535_40705_chunk.type_id, _hyper_535_40705_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40705_chunk.id = ANY ('{3,330050}'::integer[]))                                                                                                                                                                         |
        Rows Removed by Filter: 110000                                                                                                                                                                                                                         |
        Batches Removed by Filter: 111                                                                                                                                                                                                                         |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40718_chunk  (cost=0.00..11.69 rows=693 width=96) (actual time=0.007..0.028 rows=111 loops=1)                                                                                                 |
              Output: compress_hyper_536_40718_chunk.id, compress_hyper_536_40718_chunk.type_id, compress_hyper_536_40718_chunk.ts, compress_hyper_536_40718_chunk._ts_meta_count, compress_hyper_536_40718_chunk._ts_meta_sequence_num, compress_hyper_536_407|
Query Identifier: -7444174897097278007                                                                                                                                                                                                                         |
Planning Time: 0.874 ms                                                                                                                                                                                                                                        |
Execution Time: 3.682 ms                                                                                                                                                                                                                                       |

QUERY PLAN                                                                                                                                                                                                                                                     |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Append  (cost=0.16..254.43 rows=385000 width=16) (actual time=0.174..0.274 rows=1 loops=1)                                                                                                                                                                     |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40701_chunk  (cost=0.16..12.39 rows=77000 width=16) (actual time=0.057..0.057 rows=0 loops=1)                                                                                          |
        Output: _hyper_535_40701_chunk.id, _hyper_535_40701_chunk.type_id, _hyper_535_40701_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40701_chunk.id = 330050)                                                                                                                                                                                                |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40711_chunk  (cost=0.00..12.39 rows=77 width=96) (actual time=0.056..0.056 rows=0 loops=1)                                                                                                    |
              Output: compress_hyper_536_40711_chunk.id, compress_hyper_536_40711_chunk.type_id, compress_hyper_536_40711_chunk.ts, compress_hyper_536_40711_chunk._ts_meta_count, compress_hyper_536_40711_chunk._ts_meta_sequence_num, compress_hyper_536_407|
              Filter: ((compress_hyper_536_40711_chunk._ts_meta_min_1 <= 330050) AND (compress_hyper_536_40711_chunk._ts_meta_max_1 >= 330050))                                                                                                                |
              Rows Removed by Filter: 111                                                                                                                                                                                                                      |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40702_chunk  (cost=0.16..12.39 rows=77000 width=16) (actual time=0.048..0.048 rows=0 loops=1)                                                                                          |
        Output: _hyper_535_40702_chunk.id, _hyper_535_40702_chunk.type_id, _hyper_535_40702_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40702_chunk.id = 330050)                                                                                                                                                                                                |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40712_chunk  (cost=0.00..12.39 rows=77 width=96) (actual time=0.048..0.048 rows=0 loops=1)                                                                                                    |
              Output: compress_hyper_536_40712_chunk.id, compress_hyper_536_40712_chunk.type_id, compress_hyper_536_40712_chunk.ts, compress_hyper_536_40712_chunk._ts_meta_count, compress_hyper_536_40712_chunk._ts_meta_sequence_num, compress_hyper_536_407|
              Filter: ((compress_hyper_536_40712_chunk._ts_meta_min_1 <= 330050) AND (compress_hyper_536_40712_chunk._ts_meta_max_1 >= 330050))                                                                                                                |
              Rows Removed by Filter: 111                                                                                                                                                                                                                      |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40703_chunk  (cost=0.16..12.39 rows=77000 width=16) (actual time=0.049..0.049 rows=0 loops=1)                                                                                          |
        Output: _hyper_535_40703_chunk.id, _hyper_535_40703_chunk.type_id, _hyper_535_40703_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40703_chunk.id = 330050)                                                                                                                                                                                                |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40713_chunk  (cost=0.00..12.39 rows=77 width=96) (actual time=0.049..0.049 rows=0 loops=1)                                                                                                    |
              Output: compress_hyper_536_40713_chunk.id, compress_hyper_536_40713_chunk.type_id, compress_hyper_536_40713_chunk.ts, compress_hyper_536_40713_chunk._ts_meta_count, compress_hyper_536_40713_chunk._ts_meta_sequence_num, compress_hyper_536_407|
              Filter: ((compress_hyper_536_40713_chunk._ts_meta_min_1 <= 330050) AND (compress_hyper_536_40713_chunk._ts_meta_max_1 >= 330050))                                                                                                                |
              Rows Removed by Filter: 111                                                                                                                                                                                                                      |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40704_chunk  (cost=0.16..12.39 rows=77000 width=16) (actual time=0.020..0.071 rows=1 loops=1)                                                                                          |
        Output: _hyper_535_40704_chunk.id, _hyper_535_40704_chunk.type_id, _hyper_535_40704_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40704_chunk.id = 330050)                                                                                                                                                                                                |
        Rows Removed by Filter: 2999                                                                                                                                                                                                                           |
        Batches Removed by Filter: 2                                                                                                                                                                                                                           |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40714_chunk  (cost=0.00..12.39 rows=77 width=96) (actual time=0.004..0.043 rows=3 loops=1)                                                                                                    |
              Output: compress_hyper_536_40714_chunk.id, compress_hyper_536_40714_chunk.type_id, compress_hyper_536_40714_chunk.ts, compress_hyper_536_40714_chunk._ts_meta_count, compress_hyper_536_40714_chunk._ts_meta_sequence_num, compress_hyper_536_407|
              Filter: ((compress_hyper_536_40714_chunk._ts_meta_min_1 <= 330050) AND (compress_hyper_536_40714_chunk._ts_meta_max_1 >= 330050))                                                                                                                |
              Rows Removed by Filter: 108                                                                                                                                                                                                                      |
  ->  Custom Scan (DecompressChunk) on _timescaledb_internal._hyper_535_40705_chunk  (cost=0.16..12.39 rows=77000 width=16) (actual time=0.047..0.047 rows=0 loops=1)                                                                                          |
        Output: _hyper_535_40705_chunk.id, _hyper_535_40705_chunk.type_id, _hyper_535_40705_chunk.ts                                                                                                                                                           |
        Vectorized Filter: (_hyper_535_40705_chunk.id = 330050)                                                                                                                                                                                                |
        Bulk Decompression: true                                                                                                                                                                                                                               |
        ->  Seq Scan on _timescaledb_internal.compress_hyper_536_40718_chunk  (cost=0.00..12.39 rows=77 width=96) (actual time=0.047..0.047 rows=0 loops=1)                                                                                                    |
              Output: compress_hyper_536_40718_chunk.id, compress_hyper_536_40718_chunk.type_id, compress_hyper_536_40718_chunk.ts, compress_hyper_536_40718_chunk._ts_meta_count, compress_hyper_536_40718_chunk._ts_meta_sequence_num, compress_hyper_536_407|
              Filter: ((compress_hyper_536_40718_chunk._ts_meta_min_1 <= 330050) AND (compress_hyper_536_40718_chunk._ts_meta_max_1 >= 330050))                                                                                                                |
              Rows Removed by Filter: 111                                                                                                                                                                                                                      |
Query Identifier: 6492221695508221030                                                                                                                                                                                                                          |
Planning Time: 0.672 ms                                                                                                                                                                                                                                        |
Execution Time: 0.328 ms                                                                                                                                                                                                                                       |

Both plans make sequence scan. Is there a way to optimize such plans?

This trouble stops me from compressing of few hypertables.

Example above is simple, but we can imagine some other query where join our compressed hypertable by id:

-- This query will run long time...
SELECT some_id, created_at,
	(SELECT some_col FROM hypertable AS h WHERE h.id = t1.id) AS some_col
FROM some_table_or_hypertable AS t1
WHERE <condition for t1>;

-- or

SELECT some_id, created_at, t1.some_col
FROM some_table_or_hypertable AS t1
    LEFT/FULL/... JOIN hypertable AS h ON h.id = t1.id
WHERE <condition for t1>;

On big hypertables such queries are very long. And when I decompressed it then queries again run fast.

jonatasdp · April 3, 2024, 6:05pm

Hey Lux

Can you also segment by id?

	timescaledb.compress_segmentby = 'type_id,id'

LuxCore · April 4, 2024, 6:07am

Hey @jonatasdp!

Unfortunatly I cannot, because the results are disappointing:

-- id field in orderby and segmetby
ALTER TABLE test.hypertable
SET (
	timescaledb.compress,
	timescaledb.compress_orderby = 'id',
	timescaledb.compress_segmentby = 'type_id, id'
);

-- Output error:
SQL Error [42601]: ERROR: cannot use column "id" for both ordering and segmenting
  HINT: Use separate columns for the timescaledb.compress_orderby and timescaledb.compress_segmentby options.

If I understood the best practices correctly, those fields should be segmented the number of values in which will be repeated from 1000 or more times. And id from this case has unique values and does not fit for segmenting.

-- id field only in segmentby
ALTER TABLE test.hypertable
SET (
	timescaledb.compress,
	timescaledb.compress_segmentby = 'type_id, id'
);

--Compression rate is negative:
hypertable_name                             |is_compressed|total_before_compression|total_after_compression|compression_pct|
--------------------------------------------+-------------+------------------------+-----------------------+---------------+
_timescaledb_internal._hyper_535_40701_chunk|true         |8928 kB                 |15 MB                  |          -70.1|
_timescaledb_internal._hyper_535_40702_chunk|true         |8928 kB                 |15 MB                  |          -70.1|
_timescaledb_internal._hyper_535_40703_chunk|true         |8928 kB                 |15 MB                  |          -70.1|
_timescaledb_internal._hyper_535_40704_chunk|true         |8928 kB                 |15 MB                  |          -70.1|
_timescaledb_internal._hyper_535_40705_chunk|true         |8928 kB                 |15 MB                  |          -70.1|

jonatasdp · April 5, 2024, 2:30pm

Got it. Maybe if you also merge some chunks now with the new experimental param you can pack several together. Have you tried the new chunk_time_interval on the compression settings?

LuxCore · April 15, 2024, 7:14am

Good day, @jonatasdp! Oh, thanks, for the useful new compression parameter. I haven’t tested it yet, but I’ll take a note.

But so far I don’t understand how this can solve the problem of a slow join of a some relation and a compressed hypertable by few ids. After all, when compressing, the index by id ceases to exist and a full scan of the table occurs during join instead of certain chunk(s).

In addition, each chunk in a real hypertable has a period of 1 month (= 30 days) and about 2 million tuples or more.

jonatasdp · April 15, 2024, 8:48pm

Good day Lux

I was thinking about the overhead of processing smaller but multiple chunks vs a single decompress over a large batch.

Probably it’s not the case. Let us know how the new param goes for your case when you test it