How to optimize time_bucket query with json_array_elements

ewoolsey · August 20, 2024, 10:14pm

I’ve got this query:

CREATE TABLE event (
    attributes JSONB NOT NULL,
    time TIMESTAMPTZ NOT NULL,
);
CREATE INDEX idx_gin_event_prices ON event USING GIN ((attributes -> 'prices') jsonb_path_ops);
CREATE INDEX idx_event_price_id ON event((attributes -> 'prices' ->> 'price_id'));

SELECT
  create_hypertable ('event', by_range ('time', INTERVAL '20 day'));

            SELECT 
                time_bucket($2::interval, time) as bucket,
                first(price_item -> 'price_state' ->> 'price', time) AS price
            FROM 
                event,
                jsonb_array_elements(attributes -> 'prices') AS price_item
            where price_item ->> 'price_id' = $1
            AND   time >= $3::timestamptz
            GROUP BY bucket;

It’s very slow. EXPLAIN ANALYZE show’s many seq scans which I think should not be necessary. Here’s the output:

Finalize GroupAggregate  (cost=1009879.96..1009932.63 rows=200 width=40) (actual time=5850.566..5868.421 rows=21 loops=1)
  Group Key: (time_bucket('1 day'::interval, event."time"))
  ->  Gather Merge  (cost=1009879.96..1009926.63 rows=400 width=40) (actual time=5850.536..5868.331 rows=57 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=1008879.93..1008880.43 rows=200 width=40) (actual time=5812.667..5812.672 rows=19 loops=3)
              Sort Key: (time_bucket('1 day'::interval, event."time"))
              Sort Method: quicksort  Memory: 27kB
              Worker 0:  Sort Method: quicksort  Memory: 26kB
              Worker 1:  Sort Method: quicksort  Memory: 27kB
              ->  Partial HashAggregate  (cost=1008869.29..1008872.29 rows=200 width=40) (actual time=5812.583..5812.599 rows=19 loops=3)
                    Group Key: time_bucket('1 day'::interval, event."time")
                    Batches: 1  Memory Usage: 40kB
                    Worker 0:  Batches: 1  Memory Usage: 40kB
                    Worker 1:  Batches: 1  Memory Usage: 40kB
                    ->  Nested Loop  (cost=0.44..1003009.65 rows=585964 width=48) (actual time=291.787..5746.640 rows=105373 loops=3)
                          ->  Parallel Custom Scan (ChunkAppend) on event  (cost=0.43..116739.09 rows=585964 width=220) (actual time=291.552..689.795 rows=482732 loops=3)
                                Chunks excluded during startup: 0
                                ->  Parallel Index Scan Backward using _hyper_42_86_chunk_idx_event_time on _hyper_42_86_chunk  (cost=0.43..62166.60 rows=96339 width=207) (actual time=256.847..375.716 rows=136177 loops=2)
                                      Index Cond: ("time" >= (now() - '20 days'::interval))
                                ->  Parallel Seq Scan on _hyper_42_109_chunk  (cost=0.00..54572.49 rows=489625 width=222) (actual time=120.358..382.573 rows=391948 loops=3)
                                      Filter: ("time" >= (now() - '20 days'::interval))
                          ->  Function Scan on jsonb_array_elements price_item  (cost=0.01..1.50 rows=1 width=32) (actual time=0.009..0.010 rows=0 loops=1448197)
                                Filter: ((value ->> 'price_id'::text) = '0x2b89b9dc8fdf9f34709a5b106b472f0f39bb6ca9ce04b0fd7f2e971688e2e53b'::text)
                                Rows Removed by Filter: 6
Planning Time: 0.673 ms
JIT:
  Functions: 60
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 3.451 ms, Inlining 291.487 ms, Optimization 347.390 ms, Emission 235.672 ms, Total 878.001 ms
Execution Time: 5869.740 ms

Looking for some advice on what can be done here, Thanks!