Too slow query when filtering for date range + LIMIT 1 on large table

gitmalong · June 30, 2024, 9:11am

Hi guys!

I have a hypertable consisting of around 20million entries. When filtering for date entries (< mydate) LIMIT 1 I still have query times of around 250ms-1sec which is too slow for my use case.

Have I overseen something in my hypertable setup?

SELECT
	"ledger"."id",
	"ledger"."year",
	"ledger"."volume",
	"ledger"."asset_group",
	"ledger"."asset_id",
	"ledger"."source_type",
	"ledger"."source_id",
	"ledger"."date",
	"ledger"."volume_base_asset",
	"ledger"."volume_base_asset_discovery",
	"ledger"."consumed"
FROM
	"ledger"

WHERE
	"ledger"."year" = 2021
	AND "ledger"."asset_group" = 'abc'
	AND "ledger"."asset_id" = 'def'
	AND "ledger"."date" <= '2021-10-20 16:56:28 +00:00'
	AND "ledger"."volume" > 0
	AND "ledger"."consumed" = FALSE
GROUP BY
	LEDGER.ID,
	LEDGER.DATE
ORDER BY
	"ledger"."date" ASC
	
LIMIT
1

Explain: eHfb | explain.depesz.com

Create table

            CREATE TABLE IF NOT EXISTS "ledger" (
                "id" serial NOT NULL,
                "year" integer NOT NULL,
                "volume" decimal NOT NULL,
                "asset_group" text NOT NULL,
                "asset_id" text NOT NULL,
                "source_type" integer NOT NULL,
                "source_id" integer NOT NULL,
                "date" timestamp with time zone NOT NULL,
                "volume_base_asset" decimal,
                "volume_base_asset_discovery" integer NOT NULL DEFAULT 0,
                "consumed" bool NOT NULL,
                PRIMARY KEY ("id", "date")
            );
SELECT create_hypertable('ledger', by_range('date', INTERVAL '12 hours'));

CREATE INDEX IF NOT EXISTS "idx-ledger-source"
    ON public.ledger USING btree
    (source_id ASC NULLS LAST, source_type ASC NULLS LAST)
    TABLESPACE pg_default;
CREATE INDEX IF NOT EXISTS "idx-ledger"
    ON public.ledger USING btree
    (year ASC NULLS LAST, asset_group COLLATE pg_catalog."default" ASC NULLS LAST, asset_id COLLATE pg_catalog."default" ASC NULLS LAST, consumed ASC NULLS LAST, volume ASC NULLS LAST, date ASC NULLS LAST)
    TABLESPACE pg_default;

jonatasdp · July 1, 2024, 7:10pm

Is there any chance that you add an extra clause here:

	AND "ledger"."date" <= '2021-10-20 16:56:28 +00:00'

and make date between … and …

Then it will be able to exclude chunks and go straight to the right chunks of data. As you see you have a IndexCond of 1.5 seconds because it needs to go to all chunks. If you put date between it will be able to exclude chunks in advance and the IndexCond will be much more efficient as it will not go over all the chunks.

gitmalong · July 1, 2024, 7:35pm

Unfortunately I cannot narrow it down to a certain date interval because I don’t know when the “last unconsumed ledger entry” was created as I am actually searching for it.

I am wondering whether my partition fits my use case when the query from above is my most time critical use case. Could I also partition my hypertable by “asset_group”, “asset_id”, “consumed”?

jonatasdp · July 23, 2024, 3:02pm

Check how to add extra dimensions:

Keep in mind this feature is very advanced as the warning message says