Analyze command analyzes chunks two times

On PostgreSQL v15.4 and TimescaleDB v2.12.1 when “analyze verbose” command is executed to just provide table statistics I see from output that chunks are analyzed twice.

Looks like “analyze” command returns all of the hypertables and all of the chunk tables.
First is automatically executed “analyze hypertable” which performs analyze on chunks and then additionally “analyze” on chunks are performed.

It looks like “analyze” commands just outputs all of the tables (ordinary tables, hypertables, and chunks), so if “analyze” command is executed without other parameters, then chunks gets analyzed twice.

How to avoid this?

  1. Manually write select on information_schema.tables and remove hypertables from the list.
  2. Manually write select on information_schema.tables and remove tables with schema timescaledb_internal and name starting with "compress_hyper" or “hyper”.

Which of above methods is recommended? Using first method I can parallelize “analyze” on multiple chunks of the same table, each chunk executing in separate parallel command. But I wonder does hypertable needs statistics info or only chunks need statistic info?

Hi @ts101 , can you share the current output? just a sample.

As far as I know, when you query everything, it will navigate through all the chunks, so they’ll be appearing in the query plan.

I’d love to understand what are you trying to analyze and what exactly Timescale is doing to not allow it.

Also, if you can bold the duplicates and strike the parts that you feel could be removed or improved.

Lets simplify further with just one table, but issue is the same with just “analyze” command that performs analyze on all tables not just specific one.

Definition of simple hypertable.

# \d+ public.tab
                                                   Table "public.tab"
 Column |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 time   | timestamp with time zone |           | not null |         | plain   |             |              |
 value  | integer                  |           |          |         | plain   |             |              |
Indexes:
    "tab_time_idx" btree ("time" DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON tab FOR EACH ROW EXECUTE FUNCTION _timescaledb_functions.insert_blocker()
Child tables: _timescaledb_internal._hyper_2_1_chunk
Access method: heap

Above we can see, for public.tab hypertable there is single chunk table: _timescaledb_internal._hyper_2_1_chunk.

  1. Now turn on timing and collect table statistics just for chunk:
    tsdb=# \timing
    Timing is on.
    tsdb=# analyze verbose _timescaledb_internal._hyper_2_1_chunk;
    INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk”
    INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
    ANALYZE
    Time: 3449.122 ms (00:03.449)

  2. Now do the same and collect statistics on hypertable:
    tsdb=# analyze verbose public.tab;
    INFO: analyzing “_timescaledb_internal._hyper_2_1_chunk”
    INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
    INFO: analyzing “public.tab”
    INFO: “tab”: scanned 0 of 0 pages, containing 0 live rows and 0 dead rows; 0 rows in sample, 0 estimated total rows
    INFO: analyzing “public.tab” inheritance tree
    INFO: “_hyper_2_1_chunk”: scanned 54109 of 54109 pages, containing 10010000 live rows and 0 dead rows; 150000 rows in sample, 10010000 estimated total rows
    ANALYZE
    Time: 7506.129 ms (00:07.506)

In second case time has doubled and it seems the same chunk was “analyzed” twice.

Thanks for sharing this detailed instructions. It seems a bug to me.

Please, share this example as a new issue to let the core team track and fix it.