Nov 22, 2023
Posted by
Lacey Butler
Check out this month's roundup of community questions, from ways to scale your reads and writes to handling large-scale geospatial datasets.
We’ve previously published “Overheard at Office Hours” posts to recap the questions community members ask at our open forum with product team members, but we get questions on Slack, via social, and at events.
Enter “Your Questions Answered,” where we’ll share - and answer - some of the most interesting, generally applicable, or frequently asked questions from the latest Office Hours sessions, our Slack community, and more.
Have a question you’d like answered?
Without further ado, let’s dive in.
We have a full blog post to help you out with this :). Check out “Using AWS Lambda with Timescale Cloud” - it uses IoT data as an example, but the same steps and best practices apply to monitoring, geospatial, or any other type of data you may be storing in TimescaleDB.
If you run into issues or have questions, reach out on Slack at any time.
Using continuous aggregates is a great way to ensure fast and efficient query results; they remove the need to perform complex operations at query time and pre-calculate the needed result set in the background (if you’re new to continuous aggregates, see our tutorial).
If you’re going to build and update large numbers of continuous aggregates, some things that you want to monitor and keep in mind:
Two questions to ask:
1. Are my jobs sufficiently spread out, so I’m not fighting for background workers at the same time? To check if your jobs are too closely spaced: select * from timescaledb_information.continuous_aggregate_stats, which will return something like the following:
SELECT job_id, total_runs, total_failures, total_successes
FROM timescaledb_information.job_stats
WHERE hypertable_name = 'test_table';
job_id | total_runs | total_failures | total_successes
--------+------------+----------------+-----------------
1001 | 1 | 0 | 1
1004 | 1 | 0 | 1
(2 rows)
Pay particular attention to next_scheduled_run and ensure that the jobs will attempt to run in the same timeframe. If you need to alter your job schedule, please see this section of our documentation.
2. Do I have enough background workers to support the number of continuous aggregate updates I’m running? To check, run SHOW ALL
and look at how many background workers (parallel) and TimescaleDB background workers you’ve set.
chunk_size
best practices?Chunk_size
is based on the time intervals that make sense for your data and workloads. The key factor in choosing the time intervals you use to chunk your data is that the chunk (including indexes) in the most recent interval fits into memory. While it’s always safer to make your chunks smaller rather than too large, making them too small can lead to too many chunks - which may cause latency in some queries.
As such, we typically recommend setting your time intervals so that chunk(s) are no more than 25% of main memory.
Important tip: Make sure that you’re planning for single chunks from all active hypertables fit into 25% of main memory, rather than 25% per hypertable.
See our Hypertable Best Practices documentation for more recommendations, and, of course, reach out on Slack if you have any questions.
Absolutely!
We have quite a few customers using TimescaleDB and PostGIS, including Sentinel Marine Solutions (a maritime fleet monitoring provider). The Sentinel team collects real-time location, remaining range and state of charge, travel distance, fuel levels, and much more for all of their customers’ boats (for reference, they expect to surpass 5TB of data over time).
The success story dives into their architecture, and we’ve created multiple tutorials to demonstrate how to combine TimescaleDB and PostGIS, including:
time_bucket
and gap_fill
to monitor all buses running in NYC and display real-time results in Grafana.In vanilla Postgres, writes are typically scaled vertically, so you want to make sure you have plenty of memory, CPU, fast disk, and fast networking. Postgres typically handles read scaling via read-replicas for horizontal scaling.
On top of basic Postgres best practices, with TimescaleDB, we leverage in-memory and size chunks to ingest data through memory. From a user perspective, TimescaleDB exposes what look like singular tables, called hypertables, that are actually an abstraction or a virtual view of many individual tables holding the data, called chunks.
timescaledb_tune
to optimize your memory, CPU, and other settings for better performance.Schema and table design also play a role in scale and performance, either leading to significant performance improvements (correct schema design) or significant performance degradation (wrong schema design). “Correct schema design” varies by use case and workload, so we recommend reading our Schema Management documentation for more detailed information and best practices.
Finally, a tuned database means faster write and read performance. Tuning your TimescaleDB database is essential to leveraging your existing hardware and we built a tool to help you do just that.
If you’d like to learn more about how to increase performance of your reads or writes, reach out to the team on Slack at any time (we have a wealth of knowledge on staff and are happy to help).
That concludes this month’s installment of Your Questions Answered, but we’re always here to help—whether you’re just getting started with Timescale or a seasoned user with a very specific use case.
Once again, you can find us at an upcoming Office Hours session, on Slack, or on Twitter.
We’re listening!