On PostgreSQL v15.4 and TimescaleDB v2.12.1 when “analyze verbose” command is executed to just provide table statistics I see from output that chunks are analyzed twice.
Looks like “analyze” command returns all of the hypertables and all of the chunk tables.
First is automatically executed “analyze hypertable” which performs analyze on chunks and then additionally “analyze” on chunks are performed.
It looks like “analyze” commands just outputs all of the tables (ordinary tables, hypertables, and chunks), so if “analyze” command is executed without other parameters, then chunks gets analyzed twice.
How to avoid this?
Manually write select on information_schema.tables and remove hypertables from the list.
Manually write select on information_schema.tables and remove tables with schema timescaledb_internal and name starting with "compress_hyper" or “hyper”.
Which of above methods is recommended? Using first method I can parallelize “analyze” on multiple chunks of the same table, each chunk executing in separate parallel command. But I wonder does hypertable needs statistics info or only chunks need statistic info?
Lets simplify further with just one table, but issue is the same with just “analyze” command that performs analyze on all tables not just specific one.
Definition of simple hypertable.
# \d+ public.tab
Table "public.tab"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
time | timestamp with time zone | | not null | | plain | | |
value | integer | | | | plain | | |
Indexes:
"tab_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON tab FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_2_1_chunk
Access method: heap
Above we can see, for public.tab hypertable there is single chunk table: _timescaledb_internal._hyper_2_1_chunk.
Now turn on timing and collect table statistics just for chunk:
tsdb=# \timing
Timing is on.
tsdb=# analyze verbose _timescaledb_internal._hyper_2_1_chunk;
INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk”
INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
ANALYZE
Time: 3449.122 ms (00:03.449)
Now do the same and collect statistics on hypertable:
tsdb=# analyze verbose public.tab;
INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk” INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
INFO: analyzing “public.tab”
INFO: “tab”: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing “public.tab” inheritance tree INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
ANALYZE
Time: 7506.129 ms (00:07.506)
In second case time has doubled and it seems the same chunk was “analyzed” twice.