Aug 01, 2024
This is an installment of our “Community Member Spotlight” series, where we invite our customers to share their work, shining a light on their success and inspiring others with new ways to use technology to solve problems.
In this edition, we speak with Jeremy Theocharis, co-founder and CTO of United Manufacturing Hub, about how they are bringing open source to the world of manufacturing by combining information and operational tools and technologies in an open-source Helm chart for Kubernetes.
The team uses TimescaleDB to store both relational and time-series data coming from MQTT and Kafka and then visualizes it using Grafana and their own REST API. With the data, they can prevent and predict maintenance issues, analyze and optimize production losses such as changeovers or micro-stops and reduce resource consumption, and much more.
We believe that open source is the future: we live in a world where 100 percent of all supercomputers use Linux, 95 percent of all public cloud providers use Kubernetes, and the Mars Helicopter, Ingenuity, uses f-prime.
We are confident that open source will also find its place in manufacturing, and we were among the first to use it in this field, making us the experts. Our story goes back to 2016, when we had the pain of integrating and maintaining various costly Industrial IoT solutions (IIoT). Existing vendors focused only on the end result, which resulted in large-scale IIoT projects failing because they did not address the real challenges.
After suffering for years, in 2021, we were fed up and decided to do something about it. Since then, all our products and services have been focused on efficiently integrating and operating large-scale IIoT infrastructures.
We are a team of 11 people with different backgrounds, from mechanical engineering to business administration to cybersecurity.
Me, personally? I am an IT nerd that learned programming at 14 and then studied Mechanical Engineering and Business Administration at RWTH Aachen in Germany. I did my internship first as a technical project manager in the Digital Capability Center Aachen, where I was responsible for the technical integration of various Industrial IoT solutions. I later did another internship at McKinsey & Company in Singapore and decided I needed a solid technical part in my future profession.
I started my own business as a system integrator in the Industrial IoT sector, met Christian and Alex, and founded the UMH Systems GmbH together in 2021.
The United Manufacturing Hub (UMH) is an open-source Helm chart for Kubernetes. It combines state-of-the-art information and operational tools (IT/OT) and technologies, bringing them into the engineer's hands.
I assume most of the readers here will come from traditional IT, so let me explain the world of manufacturing, especially OT, first, as it will help you understand our usage of TimescaleDB.
OT means Operational Technology. OT is the hardware and software to manage, monitor, and control industrial operations like production machines. Due to different requirements, OT has its own ecosystems.
OT comes originally from the field of electronics, and this background is still evident today. For example, the computer that controls the production machine is called a PLC (Programmable Logic Controller). It runs some peculiar flavors of Windows and Linux, which are completely hidden from the system's programmer. The programmer of the PLC will use programming languages like ladder logic, which is like drawing electrical schematics.
Because OT is an entirely different world, it is pretty hard to integrate it with the traditional IT world. During system integration, we felt all these pains and decided to develop a tool that allows an easy combination between both fields—the United Manufacturing Hub (UMH).
With UMH, one can now easily extract data from the shopfloor, from the PLC to various IO-link compatible or analog (4-20mA / 0-10V) sensors to barcodereader and different manufacturing execution or enterprise resource planning systems. Using a Unified Namespace based on MQTT and Kafka, the data is aggregated and can then be contextualized through tools like Node-RED.
From there on, the processed data is stored automatically in a TimescaleDB running either in the cloud or on-premise. To visualize the data, we use Grafana with our own REST API for manufacturing specific logics (also called factoryinsight) and our own Grafana data source.
Manufacturing data is mainly relational: orders, products, production plans, and shifts are good examples of this. However, due to the growth of analytics, time-series data gets more and more important, e.g., for preventive or predictive maintenance.
During one of those earlier system integrator projects, I realized that we needed a time-series database and a relational one.
Due to the strong marketing, we chose InfluxDB at first. We did not scan vendors; we just started with whatever we knew from home automation. It sounded perfect: a beautiful user interface, continuous queries to process data, etc.
We wanted to process raw sensor data, e.g., converting the distance of a light barrier into the machine status (running/not running). We also needed to store shifts, orders, and products and model the data. We did that via InfluxDB as well.
The project was a nightmare. To be fair, InfluxDB was not its main driver, but it definitely was in the top five. Modeling relational data into a time-series database is a bad idea. The continuous queries were failing too often without even throwing error messages. The system could not handle the data buffered somewhere in the system and arrived late.
“The stability of TimescaleDB allows us to focus on developing our microservices instead of running around fixing breaking API changes”
Additionally, Flux as a query language is comparatively new and not as easy to work with as SQL. It quickly reached the point where we had to implement Python scripts to process data because Flux had reached its limits in use cases that would work seamlessly using SQL. So we felt like InfluxDB was putting unnecessary obstacles in our way.
We even wrote a blog article about why we chose TimescaleDB over InfluxDB for the field of Industrial IoT.
One of the main factors for us to use TimescaleDB as our database is the reliability and fault tolerance [the ability of a system to continue operating properly in case of failure] it offers to our stack. Since PostgreSQL has been in development for over 25 years, it is already very robust.
"If TimescaleDB didn’t exist, we probably would have to employ a PostgreSQL-based relational database system in addition to InfluxDB for time-series data. That would mean a lot of additional effort"
The stability of TimescaleDB allows us to focus on developing our microservices instead of running around fixing breaking API changes, which newer, less stable databases like InfluxDB have shown to bring forth.
Being based on SQL was also a factor for us as SQL is the most well-known query language for relational databases—making working with it much easier. Almost any possible problem is already documented and solved somewhere on the Internet.
Now, TimescaleDB is used in our stack as our main database to store the data coming in via MQTT/Kafka. We are storing (among others) machine states, product states, orders, worker shifts, and sensor data. Some are relational; some are time-series.
If TimescaleDB didn’t exist, we probably would have to employ a PostgreSQL-based relational database system in addition to InfluxDB for time-series data. That would mean a lot of additional effort as we would have to manage two separate databases and the creation of datasets that span the two. This would also make the system more prone to errors as we would have to employ multiple querying languages.
As I mentioned, the United Manufacturing Hub is an open-source Helm chart for Kubernetes, which combines state-of-the-art IT/OT tools and technologies and brings them into the hands of the engineer.
This allows us to standardize the IT/OT infrastructure across customers and makes the entire infrastructure easy to integrate and maintain.
We typically deploy it on the edge and on-premise using k3s as light Kubernetes. In the cloud, we use managed Kubernetes services like AKS. If the customer is scaling out and okay with using the cloud, we recommend services like Timescale.
We are using TimescaleDB with MQTT, Kafka, and Grafana. We have microservices to subscribe to the messages from the message brokers MQTT and Kafka and insert the data into TimescaleDB, as well as a microservice that reads out data and processes it before sending it to a Grafana plugin, which then allows for visualization.
We are currently positioning the United Manufacturing Hub with TimescaleDB as an open-source Historian. To achieve this, we are currently developing a user interface on top of the UMH so that OT engineers can use it and IT can still maintain it. We can recommend our blog article for a good comparison between Historians and Open-Source databases.
Furthermore, we are developing a Management Console on top of the Helm chart, which makes a lot of the typical operation tasks (monitoring, logging, changing the configuration, etc.) easily accessible for the OT engineer, reducing the workload of maintaining all the edge devices, servers, and so on for the IT person.
For manufacturing, we recommend the previously mentioned blog articles and the official TimescaleDB documentation. For data models and data ingestions from MQTT and Kafka into TimescaleDB, we can also recommend looking at the United Manufacturing Hub source code (or using it directly).
One last piece of advice: I can strongly recommend the book Designing Data-Intensive Applications by Martin Kleppmann. It really helped me understand the fundamental principles in designing large-scale architectures so you can join discussions on the technical level. It explains the fundamental choices behind databases (from log-based approaches over WAL to binary trees) and the problems and solutions for distributed systems.
We’d like to thank Jeremy Theocharis and the folks and United Manufacturing Hub for sharing their story on how they are using TimescaleDB to store their data, and why they chose us over other databases.
We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@Ana Tavares), and we’ll go from there.