Analyze command analyzes chunks two times

ts101 · October 17, 2023, 1:03pm

On PostgreSQL v15.4 and TimescaleDB v2.12.1 when “analyze verbose” command is executed to just provide table statistics I see from output that chunks are analyzed twice.

Looks like “analyze” command returns all of the hypertables and all of the chunk tables.
First is automatically executed “analyze hypertable” which performs analyze on chunks and then additionally “analyze” on chunks are performed.

It looks like “analyze” commands just outputs all of the tables (ordinary tables, hypertables, and chunks), so if “analyze” command is executed without other parameters, then chunks gets analyzed twice.

How to avoid this?

Manually write select on information_schema.tables and remove hypertables from the list.
Manually write select on information_schema.tables and remove tables with schema timescaledb_internal and name starting with "compress_hyper" or “hyper”.

Which of above methods is recommended? Using first method I can parallelize “analyze” on multiple chunks of the same table, each chunk executing in separate parallel command. But I wonder does hypertable needs statistics info or only chunks need statistic info?

jonatasdp · October 17, 2023, 2:08pm

Hi @ts101 , can you share the current output? just a sample.

As far as I know, when you query everything, it will navigate through all the chunks, so they’ll be appearing in the query plan.

I’d love to understand what are you trying to analyze and what exactly Timescale is doing to not allow it.

Also, if you can bold the duplicates and ~~strike~~ the parts that you feel could be removed or improved.

ts101 · October 18, 2023, 5:34am

Lets simplify further with just one table, but issue is the same with just “analyze” command that performs analyze on all tables not just specific one.

Definition of simple hypertable.

# \d+ public.tab
                                                   Table "public.tab"
 Column |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 time   | timestamp with time zone |           | not null |         | plain   |             |              |
 value  | integer                  |           |          |         | plain   |             |              |
Indexes:
    "tab_time_idx" btree ("time" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON tab FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_2_1_chunk
Access method: heap

Above we can see, for public.tab hypertable there is single chunk table: _timescaledb_internal._hyper_2_1_chunk.

Now turn on timing and collect table statistics just for chunk:
tsdb=# \timing
Timing is on.
tsdb=# analyze verbose _timescaledb_internal._hyper_2_1_chunk;
INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk”
INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
ANALYZE
Time: 3449.122 ms (00:03.449)
Now do the same and collect statistics on hypertable:
tsdb=# analyze verbose public.tab;
INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk”
INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
INFO: analyzing “public.tab”
INFO: “tab”: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
INFO: analyzing “public.tab” inheritance tree
~~INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows~~
ANALYZE
Time: 7506.129 ms (00:07.506)

In second case time has doubled and it seems the same chunk was “analyzed” twice.

jonatasdp · October 18, 2023, 5:27pm

Thanks for sharing this detailed instructions. It seems a bug to me.

Please, share this example as a new issue to let the core team track and fix it.