TSBS returning vastly different results

rRr · April 20, 2022, 5:04pm

Hello guys.

I have been tasked to prepare TimescaleDB vs InfluxDB assesment for my company, using this tool: GitHub - timescale/tsbs: Time Series Benchmark Suite, a tool for comparing and evaluating databases for time series data

I have prepared IoT data sets for both DBs which consist of:
1 month of 1s data
2months of 10s data
6 months of 10min data
6 months of 1h data

This should represent the data composition we will have on production machines.

To import I have merged generated data files into one file than imported it as a whole (because if importing files separately, tool was overwritting the table in DB).

However, once I have run the queries, the results seem to be very strange. Extremly different.

Just one example for all:

1000 queries in influxDB:
Query type → wall clock time
last-loc → 0.071340sec
low-fuel → 0.078845sec

1000 queries in TimescaleDB:
Query type → wall clock time
last-loc → 29.721491sec
low-fuel → 28.246950sec

I mean, I would believe it if the results were little bit different, but this is such a big scale of differency that I simply cannot accept.

So, my question is:

How do I verify the results? How do I find the issue? I am not even sure if influx or timescsle data are wrong.

Is it possible that I have some wrong configuration on my Timescale? I have totally stock installation, the only thing I did was to create empty database. Rest did the tool when importing data. So is there anything I should change in the postgres/timescale configuration that would improve the performance?

LorraineP · April 21, 2022, 9:24am

Hello @rRr

I think that @ryanbooz will be happy to help you later but in the meanwhile, it would be worth checking in on this post as it’s quite specific about how he went about this benchmark and it might give you the information you need. What is ClickHouse, how does it compare to PostgreSQL and TimescaleDB, and how does it perform for time-series data? You might also get a lot from this YouTube stream Introduction to the Time Series Benchmark Suite (TSBS) - YouTube that specifically looks at TSBS.

For sure, many projects have made the move from InfluxDB to TimescaleDB. They wouldn’t have done that if there was a huge loss of performance – this is a good example Drogue Cloud: Release 0.7.0 — Drogue IoT – so I’m sure we’ll be able to help you get to the bottom of this.

You might find this tool useful too? https://docs.timescale.com/timescaledb/latest/how-to-guides/configuration/timescaledb-tune/ And if you didn’t find them already our tutorials (such as this one https://docs.timescale.com/timescaledb/latest/tutorials/analyze-intraday-stocks/ ) could also unlock your issue!

Hope this helps? If you’re still struggling let me know and I’ll call to Ryan

ryanbooz · April 21, 2022, 3:56pm

@rRr,

I have a few thoughts on the IoT dataset specifically, but first could you please clarify a few things?

Are you implying that you generated four datasets separately and then combined them all into one? What was the reason/purpose for that? To get more long-term data or something else?
Could you provide the overall settings you used to generate the data and the queries:

scale setting (how many trucks/tags are being tracked)
chunk_time_interval was for the TimescaleDB setup
timestamp-start/timestamp_end
did you compress the data?
are you using the same machine for both Influx and TimescaleDB?
What is the configuration of the machine for each?

Thanks!

rRr · April 22, 2022, 11:09am

Hello, thank you for your answer. I really appreciate the opportunity to talk to creator himself.

first, I would like to answer your questions:

Yes, exactly. I have generated 4 files with data:

./tsbs_generate_data --use-case="iot" --seed=345 --scale=50 --timestamp-start="2022-03-02T00:00:00Z" --timestamp-end="2022-04-01T00:00:00Z" --log-interval="1s" --format="timescaledb" | gzip > timescaledb-data_1s.gz
./tsbs_generate_data --use-case="iot" --seed=345 --scale=50 --timestamp-start="2022-01-01T00:00:00Z" --timestamp-end="2022-03-02T00:00:00Z" --log-interval="10s" --format="timescaledb" | gzip > timescaledb-data_10s.gz
./tsbs_generate_data --use-case="iot" --seed=345 --scale=50 --timestamp-start="2021-07-01T00:00:00Z" --timestamp-end="2022-01-01T00:00:00Z" --log-interval="600s" --format="timescaledb" | gzip > timescaledb-data_10min.gz
./tsbs_generate_data --use-case="iot" --seed=345 --scale=50 --timestamp-start="2021-01-01T00:00:00Z" --timestamp-end="2021-07-01T00:00:00Z" --log-interval="1h" --format="timescaledb" | gzip > timescaledb-data_1h.gz

And than, what I did, was to unzip 1second file. Made it “base” for merged file.
Next step was append all lines except first 5 (where headers are) from 10seconds file into that “base merged” file.
Than did same with 10min file. And also with 1h file.

The reason for that was to simulate real word data, where we are going to have (or want to have) reduced size of old data by decreasing “log_interval”. Not sure if its clear enough. Basically, we dont need too many details for old data so we reduce them and get longer history for same HDD space.

The data generation is up there. For the queries, I was using this setup:

./tsbs_generate_queries --use-case="iot" --seed=345 --scale=50 --timestamp-start="2022-03-02T00:00:00Z" --timestamp-end="2022-04-01T00:00:01Z" --queries=100 --query-type="last-loc" --format="timescaledb" | gzip > tsdb-q_1h_last-loc.gz

./tsbs_generate_queries --use-case="iot" --seed=345 --scale=50 --timestamp-start="2022-03-02T00:00:00Z" --timestamp-end="2022-04-01T00:00:01Z" --queries=1000 --query-type="last-loc" --format="timescaledb" | gzip > tsdb-q_1k_last-loc.gz

I have been generating / running queries for each of the available query-type separately. So I did run it 12 times for 100 queries and 12 times for 1000 queries with different query-type value.

Second, there is one more thing I forgot to mention, it is, that we use InfluxDB v2. And since tsbs does not work with v2, I have took a short GoLang course on codeacademy and tried to frix your code for InfluxV2. Which, seemed liek its working, but in fact, all the queries returned null and thats the reason why Influx appears to be so extremly fast

I have realized it only yesterday after doing some debugging on it 9because the influx tiems were extremly fast, it did not make sense). So, now I am back on the coding and trying to make InfluxDB v2 work properly with it.

Maybe, I could be brave enough to ask for some hints, where to look for and what to modify to fix the Influx part?
I have dived quite deep into the code and GoLang, but as a novice in GoLand, I have issues to understand some stuff there.

ryanbooz · April 22, 2022, 3:19pm

Rasto,

Wow, you’ve really been getting deep quickly. Really impressed with your efforts! And thank you for your answers. That provides a little better picture of what you’re doing. Unfortunately, it also opens up a few other issues and discussion points. Let me see if I can help a little more.

Data Approach
What you’re trying to do with the data goes a little against the grain with how TimescaleDB is intended to work. That doesn’t mean you can’t do it, but the functionality we’ve built handles much of what you’re doing manually in a test. So, the results may appear or function one way, but you’d actually do it differently in your real application.

For downsampling data, TimescaleDB provides Continuous Aggregates (CAGG) and (in your example) data retention. In “real life”, you would set up multiple CAGGs for your downsampled data (10s, 10m, 1h), setup a policy to keep them updated as new raw data comes in, and then drop the raw data from the hypertable after it reaches some defined age (maybe 1-week or 1-month). The CAGG data wouldn’t be dropped, only the raw data.

But, these features are currently difficult to test with a benchmark like TSBS because data is pushed in as quickly as possible to test ingest (usually with a static timeframe), and the queries aren’t designed to return different intervals based on the age of the data. Again, that’s just not how it was designed (at this point).

If you want to test a setup like this, then the main question I have is: What are you trying to test for? That might facilitate some more questions to see if we can help you get further along.

InfluxDB 2
Your timing couldn’t be any more perfect… almost down do the day! We just started discussing this week putting effort into updating the tool to support InfluxDB 2. Honestly, we haven’t started yet, so if you have something to share, maybe we can try to collaborate some. If you push your branch to the repo (or share it with us to clone), that would be the best approach to get started. I don’t have a timeline and resources are currently limited here, but it’s on our radar and your work might give us the jumpstart we need to help you get it over the finish line.

Thanks!

rRr · April 22, 2022, 6:26pm

Well, what do we want to test for. Actually, TSBS does not contain type of queries that are using in our use case, so we are doing just a general assessment. We want to have same dataset on same hardware and see which one (Influx of Timescale) performs better. So the type of queries does not matter much (for us right now). But we wanted to have dataset similar to our use case. That means, different log intervals for different time ranges.

As for InfluxV2, I have pushed my branch just few hours ago: GitHub - rusnackor/tsbs at influx_v2_extension
(there is alot of hardcoded and debug stuff, definetly not final version)

So far, I have added authentication with Token and Organization ID. That is working.

But I wasn’t able to move query from path to body of the request. As API V2.2 is suggesting

That’s why I was trying to use InfluxQL for query sending, instead of Flux.

Yesterday I spent all day debugging it, and I got from:

~/go/bin # ./tsbs_run_queries_influx --workers=8 --db-name="sandboxDB" --print-responses --file=/share/q_ex/influx-q_1_last-loc
{
ID 0:   "influxql": "SELECT \"latitude\", \"longitude\" \n\t\tFROM \"readings\" \n\t\tWHERE \"fleet\"='North' \n\t\tGROUP BY \"name\",\"driver\" \n\t\tORDER BY \"time\" \n\t\tLIMIT 1",
ID 0:   "response": {
ID 0:     "results": [
ID 0:       {
ID 0:         "error": "database not found: sandboxDB",
ID 0:         "statement_id": 0
ID 0:       }
ID 0:     ]
ID 0:   }
ID 0: }

through:

Body: {"code":"invalid","message":"failed to decode request body: failed parsing request body as JSON; if sending a raw Flux script, set 'Content-Type: application/vnd.flux' in your request headers: EOF"}
Error: <nil>
panic: http request did not return status 200 OK - R

Body: {"code":"invalid","message":"failed to decode request body: request body requires either query or AST"}
Error: <nil>
panic: http request did not return status 200 OK - R

to where I am actually:

~/go/bin # ./tsbs_run_queries_influx --workers=8 --token="MyTokenMyToken==" --organization="Idk" --db-name="benchmarkdb" --print-responses --file=/share/q_ex/influx-q_1_last-loc
2022/04/22 10:25:54 daemon URLs: [http://localhost:8086]
----- ====== HQ:
HumanLabel: "Influx last location per truck", HumanDescription: "Influx last location per truck", Method: "POST", Path: "/query?q=SELECT+%22latitude%22%2C+%22longitude%22+%0A%09%09FROM+%22readings%22+%0A%09%09WHERE+%22fleet%22%3D%27North%27+%0A%09%09GROUP+BY+%22name%22%2C%22driver%22+%0A%09%09ORDER+BY+%22time%22+%0A%09%09LIMIT+1", Body: ""
----- ===== URI:
http://localhost:8086/query?q=SELECT+%22latitude%22%2C+%22longitude%22+%0A%09%09FROM+%22readings%22+%0A%09%09WHERE+%22fleet%22%3D%27North%27+%0A%09%09GROUP+BY+%22name%22%2C%22driver%22+%0A%09%09ORDER+BY+%22time%22+%0A%09%09LIMIT+1&database=benchmarkdb&orgID=Idk
{
ID 0:   "influxql": "SELECT \"latitude\", \"longitude\" \n\t\tFROM \"readings\" \n\t\tWHERE \"fleet\"='North' \n\t\tGROUP BY \"name\",\"driver\" \n\t\tORDER BY \"time\" \n\t\tLIMIT 1", 
ID 0:   "response": null
ID 0: }

Run complete after 1 queries with 8 workers (Overall query rate 189.51 queries/sec):
Influx last location per truck:
min:     1.97ms, med:     1.97ms, mean:     1.97ms, max:    1.97ms, stddev:     0.00ms, sum:   0.0sec, count: 1
all queries                   : 
min:     1.97ms, med:     1.97ms, mean:     1.97ms, max:    1.97ms, stddev:     0.00ms, sum:   0.0sec, count: 1
wall clock time: 0.008019sec
~/go/bin #

And now, I am deciding between 2 options:
1.) Debug why InfluxQL is returning no response (even if I run the URL with curl). But I am not sure if it is even possible, if InfluxV2.2 still support this.
2.) Move to Flux syntax, but that requires to rewrite query generation and rewrite all the queries from SQL syntax to the flux language. But that is a deep well of queries generation that I am not sure if I wont drown in also rewriting all the queries from SQL to flux is not a small task.

However, I still think, I am really close to make it work, and hopefully with a little bit of support, I can finish it. Maybe not make it user friendly as the original script is, but working at least with some hardcoded stuff and than, making it user friend, shouldn’t be too hard for you guys.

ryanbooz · April 22, 2022, 9:37pm

Our plan (likely) is to re-write the queries to Flux because it’s not a totally reasonable test unless we do that.

For you to use InfluxQL, you have to map buckets and such as outlined in this InfluxDB document. If you haven’t done that, I wouldn’t expect anything to work. That said, we’re just starting down this same path and I can’t verify anything for you yet. (sorry)

Did you follow that document after importing the data?

rRr · April 22, 2022, 11:46pm

Yes, I do have all that set up:

~/go/bin # influx bucket list
ID                      Name            Retention       Shard group duration    Organization ID         Schema Type                                                                                                                          
215eb0dc63d818ae        _monitoring     168h0m0s        24h0m0s                 d5cf1d8e5ed67e25        implicit
7f01c3c732d7fbf2        _tasks          72h0m0s         24h0m0s                 d5cf1d8e5ed67e25        implicit
a8e6cc4dc72fd483        benchmarkdb     infinite        168h0m0s                d5cf1d8e5ed67e25        implicit

~/go/bin # influx auth list                                                                                                                                                                                                                  
ID                      Description             Token                   User Name       User ID                 Permissions                                                                                                                  
09044c587e7d6000        rusnackor's Token       MyTokenMyToken==        rusnackor       09044c585b3d6000        [read:/authorizations write:/authorizations read:/buckets write:/buckets read:/dashboards write:/dashboards read:/orgs write:/orgs read:/sources write:/sources read:/tasks write:/tasks read:/telegrafs write:/telegrafs read:/users write:/users read:/variables write:/variables read:/scrapers write:/scrapers read:/secrets write:/secrets read:/labels write:/labels read:/views write:/views read:/documents write:/documents read:/notificationRules write:/notificationRules read:/notificationEndpoints write:/notificationEndpoints read:/checks write:/checks read:/dbrp write:/dbrp read:/notebooks write:/notebooks read:/annotations write:/annotations] 

~/go/bin # influx v1 dbrp ls
ID                      Database        Bucket ID               Retention Policy        Default Organization ID
0933e588afc05000        sandboxDB       a8e6cc4dc72fd483        52w                     true    d5cf1d8e5ed67e25

~/go/bin #

Anyway, I am going to work on it, its basically my task, to make it work. And I think, if there is any way you can provide me a support, I could do that for you guys

For now, I gave up on v1 compatibility, I am proceeding with v2 API. As you can see in debug log bellow, when I print result of hq := q.(*query.HTTP) (line 126 in tsbs\cmd\tsbs_run_queries_influx\main.go).
I get error, that body of request is not a query:

 ----- ====== HQ:
HumanLabel: "Influx last location per truck", HumanDescription: "Influx last location per truck", Method: "POST", Path: "/api/v2/query?orgID=d5cf1d8e5ed67e25", Body: ""
----- ===== URI:
http://localhost:8086/api/v2/query?orgID=d5cf1d8e5ed67e25
Body: {"code":"invalid","message":"failed to decode request body: request body requires either query or AST"}
Error: <nil>
panic: http request did not return status 200 OK - R

 ----- ====== HQ:
HumanLabel: "Influx last location per truck", HumanDescription: "Influx last location per truck", Method: "POST", Path: "/api/v2/query?orgID=d5cf1d8e5ed67e25", Body: ""
----- ===== URI:
http://localhost:8086/api/v2/query?orgID=d5cf1d8e5ed67e25/api/v2/query?orgID=d5cf1d8e5ed67e25&database=benchmarkdb&orgID=Idk
Body: {"code":"invalid","message":"failed to decode request body: request body requires either query or AST"}
Error: <nil>
panic: http request did not return status 200 OK - R

For now, what I think I need to solve (and what I hope to get hints about from you) is:

1.) Where in the \tsbs\ (which file) the query creation is, where should I modify it(from SQL to Flux).
2.) Little help in \tsbs\cmd\tsbs_run_queries_influx\http_client.go (my branch). I am not entirely sure how do I move the query loaded from query file, into q.Body in the line 91: req, err := http.NewRequest(string(q.Method), string(w.uri), bytes.NewBuffer(q.Body)). (Remember, I started to learn GoLang 2 weeks ago, so I am not super familiar with the libraries you use )

I think, this is 2 main steps that I need to do, to make the Influx V2 work.
Maybe some minor stuff after, but looking at the V2.2 documentation I have everything else done. I do have Token in header of request, I do have OrgID in the request URL (previously retrieved from DB). I have also Content-Type defined in header and Accept too. the only thing I am missing according to documentation, is Body, or payload, of the request.

rRr · April 28, 2022, 1:09pm

I have fixed it. I am not using Flux (because it requires to write new queries).

I did it using InfluxQL which uses the SQL like queries. So its woring with the queries that TSBS is already generating.

The changes are pushed into my branch (I posted link above). The code is not cleaned up yet, I want to test some different options before I call it finished and clean up the code.

BUT it is calling the queries and getting results:

~/go/bin # ./tsbs_run_queries_influx --db-name="benchmarkdb" --organization="Whatever" --print-responses --token="MyTokenMyToken==" --workers=8 --file=/tsdb_testbench/query_examples/influx-q_1_last-loc
2022/04/28 12:18:37 daemon URLs: [http://localhost:8086]
----- ====== HQ:
HumanLabel: "Influx last location per truck", HumanDescription: "Influx last location per truck", Method: "POST", Path: "/query?q=SELECT+%22latitude%22%2C+%22longitude%22+%0A%09%09FROM+%22readings%22+%0A%09%09WHERE+%22fleet%22%3D%27North%27+%0A%09%09GROUP+BY+%22name%22%2C%22driver%22+%0A%09%09ORDER+BY+%22time%22+%0A%09%09LIMIT+1", Body: ""
----- ===== URI:
http://localhost:8086/query?q=SELECT+%22latitude%22%2C+%22longitude%22+%0A%09%09FROM+%22readings%22+%0A%09%09WHERE+%22fleet%22%3D%27North%27+%0A%09%09GROUP+BY+%22name%22%2C%22driver%22+%0A%09%09ORDER+BY+%22time%22+%0A%09%09LIMIT+1&db=benchmarkdb&orgID=Whatever
{
ID 0:   "influxql": "SELECT \"latitude\", \"longitude\" \n\t\tFROM \"readings\" \n\t\tWHERE \"fleet\"='North' \n\t\tGROUP BY \"name\",\"driver\" \n\t\tORDER BY \"time\" \n\t\tLIMIT 1",
ID 0:   "response": {
ID 0:     "results": [
ID 0:       {
ID 0:         "series": [
ID 0:           {
ID 0:             "columns": [
ID 0:               "time",
ID 0:               "latitude",
ID 0:               "longitude"
ID 0:             ],
ID 0:             "name": "readings",
ID 0:             "tags": {
ID 0:               "driver": "",
ID 0:               "name": "truck_10"
ID 0:             },
ID 0:             "values": [
ID 0:               [
ID 0:                 "2021-03-10T00:00:00Z",
ID 0:                 38.81037,
ID 0:                 163.42446
ID 0:               ]
ID 0:             ]
ID 0:           },
ID 0:           {
ID 0:             "columns": [
ID 0:               "time",
ID 0:               "latitude",
...
ID 0:           }
ID 0:         ],
ID 0:         "statement_id": 0
ID 0:       }
ID 0:     ]
ID 0:   }
ID 0: }

Run complete after 1 queries with 8 workers (Overall query rate 28.94 queries/sec):
Influx last location per truck:
min:    26.37ms, med:    26.37ms, mean:    26.37ms, max:   26.37ms, stddev:     0.00ms, sum:   0.0sec, count: 1
all queries                   :
min:    26.37ms, med:    26.37ms, mean:    26.37ms, max:   26.37ms, stddev:     0.00ms, sum:   0.0sec, count: 1
wall clock time: 0.044338sec
~/go/bin #

The trick was, I did take new way of communication with Influx with from the influxdb-comparsions project. And there were some options in the API request: --header 'Content-type: application/json' & --header 'Accept: application/csv'. Once I removed those from the code, It worked