Issues with TimescaleDB Integration on My Website

joeroot · December 1, 2024, 1:36pm

Hi Timescale Community,

I’m currently integrating TimescaleDB with my website to manage time-series data (user activity logs and sensor data), but I’m encountering a couple of issues during the setup and data queries.

Here’s What I’ve Done:

I’ve installed TimescaleDB on my PostgreSQL instance and created hypertables for tracking time-series data.
I’ve set up my website to log data points into the TimescaleDB instance, using Python and psycopg2 for database interaction.

The Problems:

Slow Query Performance: Some queries to aggregate data over a specific time period are much slower than expected, even though I’ve indexed the appropriate columns.
Data Insertion Delays: Occasionally, when inserting data, there seems to be a delay before the data becomes available for querying, even though the connection is established and the queries run correctly.

Questions:

Are there any specific optimizations or settings I should apply to improve query performance on TimescaleDB?
How can I reduce the delay in data availability after inserting into the hypertables?

Any help or best practices would be appreciated. Thanks!

jonatasdp · December 2, 2024, 1:33pm

Hi @joeroot , thanks for bringing this up.

Can you share more details about your data? throughput, frequency, cardinality, table schema, indices, etc.

how you’re batching your data for insertion?

Also, if you can share the EXPLAIN ANALYZE of your query, that can help us to understand the costs of the query and if it’s getting the properly indices.

One case that happens frequently is query without filtering by the time column, it leads to longer queries that needs to go through all the partitions.

joeroot · December 3, 2024, 7:42am

joeroot:

Hi Timescale Community,

I’m currently integrating TimescaleDB with my capcut website to manage time-series data (user activity logs and sensor data), but I’m encountering a couple of issues during the setup and data queries.

Here’s What I’ve Done:

I’ve installed TimescaleDB on my PostgreSQL instance and created hypertables for tracking time-series data.

I’ve set up my website to log data points into the TimescaleDB instance, using Python and psycopg2 for database interaction.

The Problems:

Slow Query Performance: Some queries to aggregate data over a specific time period are much slower than expected, even though I’ve indexed the appropriate columns.

Data Insertion Delays: Occasionally, when inserting data, there seems to be a delay before the data becomes available for querying, even though the connection is established and the queries run correctly.

Questions:

Are there any specific optimizations or settings I should apply to improve query performance on TimescaleDB?

How can I reduce the delay in data availability after inserting into the hypertables?

Any help or best practices would be appreciated. Thanks!

Thank you for the prompt reply! Here’s the additional information you requested:

Data Details:

Throughput: Approximately 500 data points per second during peak usage.
Frequency: Data points are logged every 5 seconds for user activity and sensor data.
Cardinality: For user activity logs, around 1,000 unique users concurrently; for sensor data, 200 unique sensors.

Table Schema:

CREATE TABLE activity_logs (
    user_id INT,
    activity_time TIMESTAMPTZ NOT NULL,
    activity_details JSONB,
    PRIMARY KEY (user_id, activity_time)
);

CREATE TABLE sensor_data (
    sensor_id INT,
    recorded_time TIMESTAMPTZ NOT NULL,
    sensor_readings JSONB,
    PRIMARY KEY (sensor_id, recorded_time)
);

I’ve created hypertables for both tables:

SELECT create_hypertable('activity_logs', 'activity_time');  
SELECT create_hypertable('sensor_data', 'recorded_time');

Indices:

Activity Logs:
- (activity_time) for time-based filtering
- (user_id, activity_time) for specific user activity queries
Sensor Data:
- (recorded_time) for time-based filtering
- (sensor_id, recorded_time) for sensor-specific queries

Data Insertion:

I’m batching data points using Python (psycopg2). Each batch contains 100–200 rows.
Using executemany() for batched inserts.

Query Example and EXPLAIN ANALYZE:

Here’s an example of a query that’s slower than expected:

SELECT COUNT(*)
FROM activity_logs
WHERE activity_time >= NOW() - INTERVAL '1 day'
  AND activity_time < NOW();

Observations:

Query Performance:
Queries with time filtering sometimes scan more data than expected. Perhaps I’m missing some optimizations for partition pruning?
Data Insertion Delays:
Could the batching size or lack of async inserts be a factor?

Any advice on how to address these would be greatly appreciated! Please let me know if you need further details.

Thanks again for your help!

Ventsislav · December 4, 2024, 2:03pm

Is your chunk interval optimized? If not, you might want to set it explicitly using - SELECT set_chunk_time_interval(‘activity_logs’, interval ‘1 day’);
Additionally, for better performance on recurring aggregate queries, consider creating a materialized view. Also check for index if exists on column activity_time. Execute the query with EXPLAIN SELECT COUNT(*)
FROM activity_logs
WHERE activity_time >= NOW() - INTERVAL ‘1 day’
AND activity_time < NOW(); And share the result