Optimizing multi-tenancy in Timescale

janko · February 17, 2022, 12:54pm

We currently have analytics implemented using Power BI and Postgres’ native table partitioning. Because we’re only every querying data for a single client at a time, we’re partitioning by client & month.

We’re now re-building our analytics, and are implementing a solution based on Timescale. We’re using time for time partitioning (with 1 week interval), and I initially thought about using client_id for space partitioning, but then I read that space partitioning was intended mainly for query parallelization. Since none of our SQL queries will ever query data for multiple clients, it seemed like we wouldn’t benefit from performance benefits that space partitioning provides.

We currently went with chunk reordering on a (client_id, time) index, with the hopes of increasing performance. However, the docs say that only chunks starting from 3rd most recent will get ordered, which for us translates to data from 3rd week or older. We’re allowing users to choose viewing analytics data for the last 1/2/4 weeks, so with this reordering policy queries on last 1/2 weeks wouldn’t be optimized, correct? We’re also considering switching to daily time interval.

We currently have around 20 clients, but we expect to have more this year. Does having this reordering strategy make sense for our use case? We would ideally like to somehow separate data for different clients, so that queries for client A doesn’t need to search through data for clients B/C/E/D/etc, which is what we currently have with Postgres partitioning. Would we benefit from space partitioning here? Or should we maybe have a hypertable per client (using Postgres inheritance)?

ryanbooz · February 17, 2022, 5:26pm

@janko - Welcome to the community!

There are a few problem sets to tackle with this question, so it might take some back-and-forth.

Query patterns aside (and the benefit that partitioning on client_id might have there), how do you plan/intend to use the actual raw data? Will it expire over time (data retention)? Do you want aggregated data that lives on longer than the raw data?

Not ignoring the other questions around chunk reordering, etc. - but a little more context on the data is helpful.

janko · February 18, 2022, 8:28am

Thank you @ryanbooz!

how do you plan/intend to use the actual raw data?

This is an e-commerce app, and it’s essentially data about revenue/profit. We want to display charts showing total revenue/profit/costs over time. But we also want to break it down into categories, and show it in a table, which involves joins across 3 tables (all of which are hypertables) do determine how much revenue/profit we should assign to which category. We also plan to allow users to compare data between two time periods.

Will it expire over time (data retention)?

We don’t plan to expire the data at this time, as we would like our clients to be able to compare stats from at least last year as well (though we might choose to expire data older than 2 years, depending on server costs).

Do you want aggregated data that lives on longer than the raw data?

That’s an interesting question. I suppose we don’t need raw data forever, so if there is a way for us to keep only aggregated data for older time periods, that would probably make sense.

The nature of our data in our main table is that we’re storing datapoints 12 times per day, which are expected to be identical multiple days/weeks in a row, and occasionally change. What we need to do with aggregations is determine for which time periods the values were the same (this will be our categories).

Hope that helps a bit

ryanbooz · February 18, 2022, 5:04pm

OK. This is helpful. Let me answer this two ways and see where that takes the conversation.

IF specific client segregation is a necessity at the partition level, then TimescaleDB isn’t going to make that easy for you. The other features (discussed next) will probably help in a lot of what you’re doing, but partitioning is based on time first, and even space partitioning is done with a hash function (although you can change this) which means clients would still get intermingled in chunks (albeit spread across more chunks per time range). I just want to be up-front about that and make sure that’s not a requirement.

Assuming it’s not a requirement (because you’re already testing TimescaleDB), then here’s my other input.

I worked with another user some time ago in a similar situation, although it was very different data. They queried 2+ billion rows, but always by a specific ID and always for long periods of time. In many ways, the partition by ID makes a lot of sense here, but they couldn’t take advantage of other features we offer.

What we found was that the indexes they had on their hypertables, based on their query patterns, needed to be optimized. Between that, and using compression on older data, the query latency they were experiencing previously with TimescaleDB (when compared to their manual ID partitioning) became a non-issue in most cases - and saved them a significant amount of disk space.

While chunk reordering could help (and you can manually do it on more recent chunks if the data will not change), I’d be surprised if the proper indexes don’t help just as much. That, coupled with Continuous Aggregates for aggregate queries that you run over and over, are really beneficial features, even with data like this.

Compression: over time, you can compress older chunks to save space AND increase query performance. As long as you don’t need to modify (update/delete) older data regularly, compressed chunks store data in columnar form and allow you to order the data appropriately for your queries, similar to what reorder_chunk does. Because it’s columnar, your aggregate queries on specific columns query a lot less data, in the order that you probably need, and often improves the speed of historical aggregate queries a lot.

Continuous Aggregates: In theory, your application/reporting layer queries this data at higher aggregation levels (1 hour, 1 day, something like that). With continuous aggregations, you can pre-aggregate the data after it’s come in so that future queries don’t have to access the raw data each time which is significant savings. You could achieve something like this on your own by running aggregations into a different table on a schedule, but there’s a lot that goes into it and TimescaleDB takes care of the refresh, etc. for you.

The additional benefit of Continuous Aggregates is that you can retain that data at a different interval than the raw data. So yes, you could drop rarw data after 2 years, but retain the daily aggregate (as an example) for 5 years or longer.

Is that helpful input or spark additional questions?

janko · February 21, 2022, 12:05pm

Thank you for the detailed explanation, that’s very helpful.

You’re right that client segregation is not our requirement, I guess calling it “multi-tenancy” was a bit misleading, as that’s what the term implies. We do only want to get the optimal performance, so that data from other clients don’t increase the query durations for a single client.

While chunk reordering could help (and you can manually do it on more recent chunks if the data will not change), I’d be surprised if the proper indexes don’t help just as much.

My understanding was the chunk reordering is an extra optimization on top of an index. Yes, we definitely want to have an index on (client_id, time) (or maybe just on client_id), and in that case I assumed reordering would help. We’ll explore the possibility of other indices, but for the types of queries we’ll be making in the initial version, I couldn’t find any other index combination that would speed them up within a single client.

Because it’s columnar, your aggregate queries on specific columns query a lot less data, in the order that you probably need, and often improves the speed of historical aggregate queries a lot.

This is really good to know. I remember seeing in the docs that compression improved performance, but your explanation really helped me grasp it. I think our data would benefit a lot from compression, because the data points for our largest table change rarely. I will definitely play with that.

We’ll probably have daily time chunks, so I’m guessing we can safely compress chunks older than 3 days, assuming we’re not writing to them anymore.

With continuous aggregations, you can pre-aggregate the data after it’s come in so that future queries don’t have to access the raw data each time which is significant savings.

Yes, we’ll be exploring that as well. At this moment I cannot imagine how Timescale could incrementally refresh our aggregates, because we’re using something more complex than classic AVG/MIN/MAX, but I will definitely try it out at some point.

Attila · February 23, 2022, 9:42am

Hi @janko !

About that last part, can I ask what are those more complex aggregate functions you’re using? In some cases, there’s a workaround to use them with continuous aggregates.

janko · February 28, 2022, 1:31pm

Hi @Attila, we’re using the following:

main table
1. window aggregate with row_number(), for solving a gaps and islands problem
2. grouping by that window aggregate, to get the “islands” of unchanged datapoints
3. determining start/end time ranges of each “island” (we’re using the lead() window function)
join on 2nd table, where time column falls into those time ranges
1. just some simple column math in SELECT
join on 3rd table, where time column falls into those time ranges
1. same window aggregations from 1st table do get islands of unchanged values

Hope that helps . I read that window functions and joins aren’t supported by continuous aggregations, so I kind of discarded that option.

We’re now optimizing this aggregation in a time window of 28 consecutive days. The whole thing currently takes about 50 seconds on my M1 Macbook, which isn’t fast enough as it only returns handful of rows. Having spent significant time looking at EXPLAIN ANALYZE, it seems that data is first retrieved from each chunk (we have daily chunks), and then the mentioned sequence is performed on that pile of data.

However, this algorithm could use divide & conquer, where we first solve gaps and islands inside each individual chunk, then solve it again between chunks (maybe time buckets can be used for that). It seems to me that this way indices (and Timescale reordering) would be used much more efficiently. Currently, when I add indices to columns used by the row_number() window aggregate, I don’t get any performance improvement (I even get slower performance), though with TS reordering I do get memory-efficient quicksort instead of external merge. From my profiling, these gaps & islands are currently the bottleneck.

Anyway, that’s a separate problem, I just wanted to put it out there in case someone wanted to weigh in

Attila · March 1, 2022, 8:57am

One idea that would improve performance is if you could find a way to pre-aggregate your data within continuous aggregates - without doing JOINs and windows functions - that would decrease the number of rows, then you do all the window functions and JOINs you need on top of this continuous aggregate because you can use those on an existing continuous aggregate

ryanbooz · March 3, 2022, 5:18pm

@janko - thanks for the additional insights. There might be some other options with Postgres that don’t incur the overhead of using row_number(), which will need to scan the entire set of data to assign the appropriate row number based on your order and group.

Any chance to provide a sample schema and query? Not sure we’ll have an immediate answer, but this does pique my interest as we develop new hyperfunctions and the gap/island problem is an interesting use case.

janko · March 9, 2022, 9:22am

Sure thing

Schema

Below is our (simplified) main hypertable, which stores when each product from an e-shop was dynamically repriced by which repricing strategy in our application (“groups”). The compression is segmented on product_id, because a given product is expected to remain in the same group for a while, so this way group_id should compress well.

CREATE TABLE "repricings" (
  "client_id" integer NOT NULL,
  "product_id" text NOT NULL,
  "group_id" bigint DEFAULT NULL,
  "time" timestamp NOT NULL,
  UNIQUE ("instance_id", "product_id", "time")
);

SELECT create_hypertable('repricings', 'time', chunk_time_interval => INTERVAL '1 week');
ALTER TABLE "repricings" SET (timescaledb.compress, timescaledb.compress_orderby = 'time ASC', timescaledb.compress_segmentby = 'instance_id, product_id');
SELECT add_compression_policy('repricings', INTERVAL '3d');

We’re building analytics for our clients (e-shops), and our first goal is to be able to tell them how much revenue/cost/profit each “group” generated in a given time period. In addition to repricings hyperable, we also have purchases hypertable, which tells us when a product was purchased with which quantity and total price (“revenue”):

CREATE TABLE "purchases" (
  "client_id" integer NOT NULL,
  "product_id" text NOT NULL,
  "quantity" integer,
  "revenue" double precision,
  timestamp NOT NULL,
  UNIQUE ("instance_id", "product_id", "time")
);

SELECT create_hypertable('purchases', 'time', chunk_time_interval => INTERVAL '1 week');
CREATE INDEX "purchases_instance_id_index" ON "purchases" ("instance_id", "time");
SELECT add_reorder_policy('purchases', 'purchases_instance_id_index');

Finally, in order to calculate cost & profit, we need to know purchase prices of products at any given moment. For this, we have the 3rd products hypertable, with compression segmented on product_id because the purchase price shouldn’t change too often:

CREATE TABLE "products" (
  "client_id" integer NOT NULL,
  "product_id" text NOT NULL,
  "purchase_price" double precision,
  "price" double precision,
  "time" timestamp NOT NULL,
  UNIQUE ("instance_id", "product_id", "time")
);

SELECT create_hypertable('products', 'time', chunk_time_interval => INTERVAL '1 week');
ALTER TABLE "products" SET (timescaledb.compress, timescaledb.compress_orderby = 'time ASC', timescaledb.compress_segmentby = 'instance_id, product_id');
SELECT add_compression_policy('products', INTERVAL '3d')

We’re storing this data in 3 tables, because in our application this data is imported into our system via independent processes. To get an idea about the volume of data, we currently have about 40 active clients, and our biggest client has about 67k active products, and we’re repricing all of their products 12 times a day. So, we ingest about 800k new repricings & products records each day.

Query

In order to determine to which group to assign revenue/profit/costs from which product at which time, we start by aggregating the time intervals for each group & product, to know exactly what time did products spend in each group. We do so with the following query:

SELECT "product_id", range_agg(tsrange("entered_at", "exited_at")) AS "period", "group_id"
FROM (
  SELECT *, lead("entered_at", 1) OVER (PARTITION BY "product_id" ORDER BY "entered_at" ASC) AS "exited_at"
  FROM (
    SELECT "product_id", "group_id", min("time") AS "entered_at"
    FROM (
      SELECT *, (row_number() OVER (PARTITION BY "product_id" ORDER BY "time" ASC) - row_number() OVER (PARTITION BY "product_id", "group_id" ORDER BY "time" ASC)) AS "partition" -- gaps and islands
      FROM "repricings"
      WHERE (("instance_id" = 29) AND ("time" >= '2022-03-03 11:14:13.763475+0000') AND ("time" < '2022-03-02 23:00:00.000000+0000'))
    ) AS "t1"
    GROUP BY "product_id", "group_id", "partition"
  ) AS "t1")
) AS "t1"
GROUP BY "product_id", "group_id"

We used subselects each time we needed to work with the result of window aggregates (gaps & islands, min, lead), though I’m guessing CTEs would be more readable.

We use the same approach to get time ranges of unchanged purchase prices from the products table. We then join these aggregated products into purchases, in order to determine profit & costs from purchase prices:

SELECT "purchases"."time", "purchases"."product_id", "revenue", ("quantity" * "purchase_price") AS "costs"
FROM "purchases"
LEFT JOIN (
  -- products aggregation similar to repricings
) AS "products" ON (("products"."product_id" = "purchases"."product_id") AND ("products"."period" @> "purchases"."time"))
WHERE (("instance_id" = 29) AND ("time" >= '2022-03-08 23:00:00.000000+0000') AND ("time" < '2022-03-02 23:00:00.000000+0000'))

Finally, we join purchases with revenue/profit/costs/margin into the repricings aggregate, in order to assign them to the appropriate “group”:

SELECT *, coalesce(("profit" / nullif("revenue", 0)), 0) AS "margin"
FROM (
  SELECT "group_id", coalesce(sum("revenue"), 0) AS "revenue", coalesce(sum(("revenue" - "costs")), 0) AS "profit", count(DISTINCT "locations"."product_id") AS "product_count"
  FROM (
    SELECT "product_id", range_agg(tsrange("entered_at", "exited_at")) AS "period", "group_id"
    FROM ( /* the repricings aggregate */) AS "locations"
    LEFT JOIN ( /* the purchases aggregate */ ) AS "purchases" ON (("purchases"."product_id" = "locations"."product_id") AND ("locations"."period" @> "purchases"."time"))
    GROUP BY "group_id"
  ) AS "stats"

This is pretty-much what we have for now. We haven’t been able to figure out how to use continuous aggregates, due to usage of window functions & JOINs, but I would like to find a divide & conquer approach (possibly with materialized views).

For a time period of 28 days, the query for a smaller customer (5k products, 1 repricing per day) executes in about 1 second, while for a larger customer (67k products, 12 repricings per day) the query executes in about 50 seconds. I tried various combinations of indices (without compression), but I couldn’t find a way for gaps & islands to use indices on both row_number() window functions, and from my profiling that’s one of the bottlenecks.

I hope this gives you enough information, without overwhelming you .