Query performance issues

Hakam · May 31, 2022, 12:23pm

Hi,
My TSDB running on Kubernetes and it becomes very slow. The DB has only one table and the table constructed from 4 columns (timestamp,event_type,value,patient_id)
The data inside the DB for now is about 132M rows. Query take about 40 seconds to finish. The query is to get events for one user and process it with built-in Math functions.

Every user will have about 6M rows per month. the expected amount in the very soon future is (6M rows * 1000 users ) per month.
Our queries using Math functions like percentile and histogram.

how can I improve the performance? am I using the right architecture?

one example of EXPLAIN a query

QUERY PLAN
Result  (cost=2853935.96..2853935.97 rows=1 width=64) (actual time=33836.488..33836.581 rows=1 loops=1)
  Buffers: shared hit=223711 read=656781, temp read=21855 written=14154
  CTE raw_month
    ->  Gather Merge  (cost=2839070.99..2846044.82 rows=60642 width=20) (actual time=31116.684..31420.984 rows=1077758 loops=1)
          Workers Planned: 1
          Workers Launched: 1
          Buffers: shared hit=223701 read=656781, temp read=4092 written=4103
          ->  Sort  (cost=2838070.98..2838222.59 rows=60642 width=20) (actual time=31088.224..31197.247 rows=538879 loops=2)
                Sort Key: events."timestamp"
                Sort Method: external merge  Disk: 17544kB
                Buffers: shared hit=223701 read=656781, temp read=4092 written=4103
                Worker 0:  Sort Method: external merge  Disk: 15192kB
                ->  Parallel Seq Scan on events  (cost=0.00..2833253.57 rows=60642 width=20) (actual time=19051.904..30885.116 rows=538879 loops=2)
                      Filter: (("timestamp" > '2022-03-26 21:00:00'::timestamp without time zone) AND ("timestamp" < '2022-04-26 20:59:59'::timestamp without time zone) AND (value > '30'::numeric) AND (value < '240'::numeric) AND (patient_id = 251) AND (event_type = 2))
                      Rows Removed by Filter: 65932942
                      Buffers: shared hit=223664 read=656781
  CTE percentiles
    ->  Result  (cost=3032.12..3032.14 rows=1 width=8) (actual time=1267.717..1267.719 rows=1 loops=1)
          Buffers: shared hit=7, temp read=7898 written=7249
          InitPlan 2 (returns $2)
            ->  Aggregate  (cost=1516.05..1516.06 rows=1 width=4) (actual time=852.074..852.075 rows=1 loops=1)
                  Buffers: shared hit=7, temp read=2236 written=5663
                  ->  CTE Scan on raw_month  (cost=0.00..1212.84 rows=60642 width=14) (actual time=0.000..552.526 rows=1077758 loops=1)
                        Buffers: temp read=1087 written=4078
          InitPlan 3 (returns $3)
            ->  Aggregate  (cost=1516.05..1516.06 rows=1 width=4) (actual time=415.631..415.632 rows=1 loops=1)
                  Buffers: temp read=5662 written=1586
                  ->  CTE Scan on raw_month raw_month_1  (cost=0.00..1212.84 rows=60642 width=14) (actual time=0.025..116.101 rows=1077758 loops=1)
                        Buffers: temp read=4079 written=1
  InitPlan 12 (returns $12)
    ->  Subquery Scan on histogram_during_af  (cost=2425.91..2425.99 rows=1 width=32) (actual time=33137.313..33137.320 rows=1 loops=1)
          Buffers: shared hit=109488 read=328470, temp read=15877 written=12250
          ->  Aggregate  (cost=2425.91..2425.92 rows=1 width=136) (actual time=33137.240..33137.243 rows=1 loops=1)
                Buffers: shared hit=109485 read=328470, temp read=15877 written=12250
                ->  GroupAggregate  (cost=2425.75..2425.83 rows=3 width=40) (actual time=32888.984..33127.556 rows=35929 loops=1)
                      Group Key: (time_bucket('00:01:00'::interval, raw_month_2."timestamp"))
                      Buffers: shared hit=109485 read=328470, temp read=15877 written=12250
                      InitPlan 10 (returns $10)
                        ->  CTE Scan on percentiles  (cost=0.00..0.02 rows=1 width=4) (actual time=1267.722..1267.723 rows=1 loops=1)
                              Buffers: shared hit=7, temp read=7898 written=7249
                      InitPlan 11 (returns $11)
                        ->  CTE Scan on percentiles percentiles_1  (cost=0.00..0.02 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1)
                      ->  Sort  (cost=2425.71..2425.72 rows=3 width=22) (actual time=32888.949..32988.555 rows=991805 loops=1)
                            Sort Key: (time_bucket('00:01:00'::interval, raw_month_2."timestamp"))
                            Sort Method: external merge  Disk: 22352kB
                            Buffers: shared hit=109485 read=328470, temp read=15877 written=12250
                            ->  CTE Scan on raw_month raw_month_2  (cost=0.00..2425.69 rows=3 width=22) (actual time=32384.436..32733.694 rows=991805 loops=1)
                                  Filter: ((value > ($10)::numeric) AND (value < ($11)::numeric) AND ((("timestamp" > '2022-03-26 21:00:00'::timestamp without time zone) AND ("timestamp" < '2022-04-24 12:19:53'::timestamp without time zone)) OR (("timestamp" > '2022-04-24 12:20:53'::timestamp without time zone) AND ("timestamp" < '2022-04-26 20:59:59'::timestamp without time zone))))
                                  Rows Removed by Filter: 85953
                                  Buffers: shared hit=109485 read=328470, temp read=13083 written=9448
          SubPlan 5
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.002 rows=1 loops=1)
          SubPlan 6
            ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 7
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 8
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 9
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
  InitPlan 20 (returns $20)
    ->  Subquery Scan on histogram_not_during_af  (cost=2432.93..2433.01 rows=1 width=32) (actual time=241.883..241.886 rows=1 loops=1)
          Buffers: temp read=4079
          ->  Aggregate  (cost=2432.93..2432.95 rows=1 width=136) (actual time=241.841..241.843 rows=1 loops=1)
                Buffers: temp read=4079
                ->  GroupAggregate  (cost=2429.04..2431.05 rows=75 width=40) (actual time=241.825..241.829 rows=2 loops=1)
                      Group Key: (time_bucket('00:01:00'::interval, raw_month_3."timestamp"))
                      Buffers: temp read=4079
                      InitPlan 18 (returns $18)
                        ->  CTE Scan on percentiles percentiles_2  (cost=0.00..0.02 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
                      InitPlan 19 (returns $19)
                        ->  CTE Scan on percentiles percentiles_3  (cost=0.00..0.02 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                      ->  Sort  (cost=2429.00..2429.23 rows=94 width=22) (actual time=241.800..241.801 rows=7 loops=1)
                            Sort Key: (time_bucket('00:01:00'::interval, raw_month_3."timestamp"))
                            Sort Method: quicksort  Memory: 25kB
                            Buffers: temp read=4079
                            ->  CTE Scan on raw_month raw_month_3  (cost=0.00..2425.91 rows=94 width=22) (actual time=218.744..241.786 rows=7 loops=1)
                                  Filter: ((value > ($18)::numeric) AND (value < ($19)::numeric) AND (("timestamp" < '2022-03-26 21:00:00'::timestamp without time zone) OR ("timestamp" > '2022-04-24 12:19:53'::timestamp without time zone)) AND (("timestamp" < '2022-04-24 12:20:53'::timestamp without time zone) OR ("timestamp" > '2022-04-26 20:59:59'::timestamp without time zone)))
                                  Rows Removed by Filter: 1077751
                                  Buffers: temp read=4079
          SubPlan 13
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.002 rows=1 loops=1)
          SubPlan 14
            ->  Result  (cost=0.00..0.01 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 15
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 16
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
          SubPlan 17
            ->  Result  (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
Planning:
  Buffers: shared hit=108 read=1
Planning Time: 1.953 ms
JIT:
  Functions: 66
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 5.386 ms, Inlining 118.725 ms, Optimization 220.767 ms, Emission 187.721 ms, Total 532.598 ms
Execution Time: 33900.238 ms

LorraineP · June 1, 2022, 5:00pm

Hello @hakam good to see you here in the community forum.

I wonder if you might be able to run the EXPLAIN through a formatted such as this one from the awesome postgres community member Depesz and share the improved formatted post using a code block? It might be easier for folks to get to grips with what’s happening here. If you do that (and share the query) then I can try to get you some help. Here’s the formatter link: https://explain.depesz.com/

LorraineP · June 1, 2022, 5:02pm

Also, do take a look at this post from @sven in case any of the advice applies to your query! SQL query filtering best practices

noctarius · June 2, 2022, 5:03am

As @LorraineP suggested, I made the query plan a bit easier to read

But from what it looks like, I’m missing any Custom Scans. How did you set up the hypertable? It almost looks like it may not be a hypertable at all.

Can you also share the original query itself?

Last but not least, you have extensive JIT activity, can you try the query without the JIT enabled?

Hakam · June 2, 2022, 5:02pm

Hi,
Here is the link for the formatted query:
https://explain.depesz.com/s/7SGQ#html

Hakam · June 2, 2022, 5:04pm

I looked at it and it seemed to me that I was doing as explained

Hakam · June 2, 2022, 5:07pm

Hi,
Here is the link:
https://explain.depesz.com/s/7SGQ#html

Hakam · June 7, 2022, 12:55pm

I think I found the problem. as @noctarius said, the table wasn’t hypertable. Also there was no indexing on the table. fixing this caused the query time to be reduced signifacntly. from 40 sec to 120ms.

Thanks