Is TimescaleDB the right choice for my situation?

MrKingD · August 16, 2023, 3:28am

Hi there everyone, a background story:
I’m working for a company who makes rockets and my job is about gorund support softwares that process rocket telemetry data, these data fall into various groups, sources, and frequencies. to simplify, I’d say around 100 groups, each with 50 parameters(metrics) at the rate of 10ms, mostly float value. Currently we uses a bunch of softwares to process them and eventually written to MySQL in a 100_tables-each_50_columns fashion. Every time we run a launch simulation it creates 100 tables… And the system runs on a rack server with 10k SAS HDD, 256GB ram, holding a linux VM and a few win VMs:

It is fast enough to present real time data since RTMonitor reads incoming data and shows them instantly. Then, we also have a post-time judging software that reads data from MySQL after the simulation ends, using predefined rules to decide whether the values are correct or not.
Now, we want to do real-time judging, the problem is that it’s not only just simple value upper/lower limit, it requires tracing data backwards in time, for example: ‘the duration of stage seperation command issued from central computer to corresponding device execution detected from sensor has to be within 100ms’. And I could not know if our enginners set a rule that overlaps minutes even hours.
That’s where I started thinking about TSDB, because I need to temporarily store large amount of time series data with an off-the-shelf structure, and it needs to be very fast. The above image in dashed lines are my thoughts of the design, grafana is just an add-on to potentially replace RTMonitor and show judge result and can be ignored, bottom line is: no bottleneck on IO, delay between incoming telemetry and data shows up on the curve/number within seconds, and flexible scripting language to write ‘rules’ that executes fast enough because it need to be evaluated like every second or half to ensure ‘real-time’ judging, and there could be dozens or hundred of rules running simutaneously.
For the past week I’ve been searching for the right TSDB, redis timeseries seems to be the best in-memory solution but it’s paid functionality in enterprise version. InfluxDB hasn’t been very acive after the release of 2.0 and ‘flux’ language isn’t cometitive against SQL in terms of history and scale. KDB outdated, prometheus and graphite may lack enough performance, TDengine claims to surpass others greatly in performance but I read their doc, one crutial functionality lacks: it only support update with timestamp given, but not a range. We have a hardware ‘offset’ timestamp generated when receiving telemetry at ground starting from zero when power up. This value needs to be shifted to make ignition time at value of 0, right after ignition happens. And post-juding rely on this time value to make decisions. It caused a serious problem when we ran the simulation for hours without ignition, mysql took too long to shift all the data for the past hours until ‘datastore’ drained it’s socket receiving cache(later I changed update procedure into async threads away from main insert loop).
Anyway, maybe I talk too much irrelevant information, but to sum up, can timescaleDB handle near instant inserting and fetching of the data amount I wanted? And is there a solution for that time shift problem because I need to start judging like seconds after ignition to catch up. Maybe a ram disk?
Thanks fo reading this long post, and forgive me if I was wrong on something because I’m just a developer with only a few years experience and has no knowledge on TSDB.

clouedoc · August 21, 2023, 12:22am

Your post is really hard to read. If you don’t take 10 minutes to format it and remove noisy info, then, do you really believe an expert will take 10 minutes of their day to answer you? Good formatting makes a huge difference online. Put yourself in your readers’ shoes.

(So here’s what your getting: a non-expert info from a TimescaleDB open-source user)

TimescaleDB might be a great fit. Here’s why:

Powerful scripting is available (“real-time judging”). You can use SQL, or even run JavaScript or Python on it with extensions.
I/O will be an issue if you don’t know what you are doing. (like everyone at first ). If you were fine with MySQL on that point, I see no reason why PostgreSQL will fail you. I don’t think you need to use a RAM disk; you can probably ask PostgreSQL to cache lots of stuff. There is an utility to make TimescaleDB tune itself to your hardware. (in case you’re lost: TimescaleDB is a PostgreSQL extension)
It’s super flexible concerning the data you put in. You seem to have issues to manipulate the data (“offset” thing). You might be able to do it by using continuous aggregates or materialised views. Of course, there are caveats, and tons of ways to shoot yourself in the foot…

TimescaleDB has been working well for me at around 1 insert every 7ms. I didn’t need to go faster yet.
I’m using continuous aggregates to query the data, and it’s fast to query it on Grafana.

Also, take a look at Citus data (https://www.citusdata.com/). It looks pretty solid too

Good luck, and beware of the rockets

jonatasdp · August 21, 2023, 3:26pm

I got around 200k metrics per second on a Raspberry PI, so, the throughput doesn’t seem to be a problem.

We have a hardware ‘offset’ timestamp generated when receiving telemetry at ground starting from zero when power up. This value needs to be shifted to make ignition time at value of 0, right after ignition happens. And post-juding rely on this time value to make decisions. It caused a serious problem when we ran the simulation for hours without ignition, mysql took too long to shift all the data for the past hours until ‘datastore’ drained it’s socket receiving cache(later I changed update procedure into async threads away from main insert loop).

And is there a solution for that time shift problem because I need to start judging like seconds after ignition to catch up. Maybe a ram disk?

You can shift the time using queries for sure. I believe you can write a sql or plpgsql function that results a resultset and then you can customize the tuple values on demand as you need.

As @clouedoc mentioned, it’s being very hard to understand as you have several issues and experiments related to what you’re testing. If you come with more detailed questions about a single objective, more people can help you easily overcome one by one