General

Jul 03, 2024

What InfluxDB Got Wrong

What InfluxDB Got Wrong

There’s been a lot of talk about InfluxDB recently in the context of InfluxDB 3.0. While some of the commentary has been focused on technology (if there is one thing that creates hype it’s rewriting in Rust), a larger part has focused on issues with the company's (InfluxData) trajectory over the years. ‌‌

“We're migrating off of InfluxDB due to that rollercoaster, honestly. It's hard enough to find time to maintain the monitoring stack at work. Casually dropping "Oh, and now you get to rebuild the entire Grafana to change the query language" on that doesn't help. And apparently, version 3 does the same thing, except backwards.” (source)‌‌

“Same here. I joined my current company 3 years ago when Influx v2 was coming out. I was supposed to build some analytics on top of it. It was very painful. Flux compiler was often giving internal errors, docs were unclear, and it was hard to write any a bit more complicated code. The dash is subpar to Grafana, but Grafana just had, raw support. There was no query builder for Flux so I tried building dashboards in Influxv2 but the whole experience was excruciating. I still have an issue open where they have an internal function incorrectly written in their own Flux code, and I provided the fix and what was the issue, but it was never addressed. Often times I had a feeling that I found bugs in situations that were so basic that it felt like I was the only person on the planet writing Flux code.” (source)‌‌

“We are Influxdb enterprise customers and looking to do the same thing. They've kept their enterprise offering on 1.x, which has kept us mostly happy, but seeing what's going on in their OSS stuff is horrifying, and we're looking to avoid the crash and burn at the end of the tunnel.” (source)

We’re Timescale (the creators of TimescaleDB), and we also compete in the time-series market, so we are undeniably biased when we are talking about InfluxDB as a piece of technology (although we always try to make our benchmarks as balanced as possible). But as developers, it also saddens us to see popular projects lose momentum. InfluxDB achieved something remarkable: building any company around a popular open-source project is not easy.

The unfortunate thing is that, somewhere along the way, InfluxData has squandered much of the developer goodwill they’ve put so much hard work to earn. Developers who use them for time-series, IoT, and observability workloads grew increasingly frustrated, and as a company, they left opportunities open for other projects to capitalize on their mistakes.

As developers of our own database company, we have to learn from other companies that build on open source. So we asked ourselves: how did this happen? What did InfluxData get wrong?

What InfluxDB Got Wrong

Instead of maturing their database, InfluxDB did not one but two backend rewrites

In the world of databases, performance is essential, but it’s not all that matters. At the end of the day, the reality of running a database in production implies that stability is key for developers. You want your database to “just work” and be as easy as possible to build on top of it. You need it to keep those qualities over a long horizon. If your database changes how things work, that translates to technical debt, which needs to be paid down before you can upgrade.

So far, InfluxDB has been completely built from scratch not once but thrice, from 1.x to 2.x to now 3.x. The latter two versions were not backward compatible. All of them were a “bold new approach” that promised to solve all the problems developers faced.

The recurring rewrites of InfluxDB have put its user base in a precarious position. Each version iteration not only demanded so much time and effort to migrate; it also challenged the trust developers had placed in the initial promises of InfluxDB as a product. It seemed as if the allure of creating something new and innovative overshadowed the fundamental task of maintaining and refining existing products with an actual user base.‌‌‌‌We get it: focusing on keeping things stable while building foundational operational components is not as exciting as talking about full database rewrites. But the picture looks quite different to the developers building their application on top of a database.

The design instability of InfluxDB has not only been damaging for their user base (and for the company’s credibility) but also has very natural consequences regarding the reliability of the database.

Since InfluxDB has been built from scratch (and more than once), it had to implement its full suite of fault-tolerance mechanisms (e.g., replication, high availability, backup/restore) and on-disk reliability (e.g., to ensure all its data structures are both durable and resist data corruption across failures).

This is a daunting task. Some of these capabilities are, in fact, either still lacking in InfluxDB or confined to the Enterprise version of the product. But even once they’re done building these, these capabilities have to be battle-tested.

Getting all the corner cases right when building a database is extremely hard: every database goes through a period when things get perfected from real-world experience. The big advantage of PostgreSQL is that it went through this period in the 1990s, while InfluxDB is still figuring things out today.

Just to be clear, we don’t think technological innovation is wrong. But you can’t reasonably expect your users to adopt two completely different solutions in a short period of time.

InfluxDB also changed its query API two times

The three major versions of Influx also came with different query languages.

Via InfluxQL, a SQL-ish query language, InfluxDB 1.x was betting on creating a “middle ground,” an environment familiar enough yet tailored for the specialized needs of time-series data. With InfluxDB 2.x, they pivoted to Flux, which was a massive paradigm shift. Users were now tasked with learning a new and proprietary query language and adapting their entire codebase to it if they wanted to migrate with InfluxDB 2.x. Now, InfluxDB 3.x is getting back to InfluxQL, frustrating the same developers who believed their promises and made a huge effort to migrate.

And this is not a problem you can fix by throwing money at it—not even cloud customers are safe from the back-and-forth language changes. Adding insult to injury, Influx Cloud 1.x runs InfluxDB 2.x (Flux), and Influx Cloud 2.x runs InfluxDB 3.x (InfluxQL). The promise of leaving infrastructure hassle behind wanes in the face of a brand-new and challenging onboarding process.

To further confuse matters, they also now support the DataFusion SQL variant using FlightSQL as transport, allegedly also supporting connections via any PostgreSQL-compatible client—except we tested that, and it doesn’t work. Influx support replied: “At this point, you can query InfluxDB IOx using the FlightSQL plugin, and supporting the Postgres wire protocol has been stopped.”

Nobody has the time to learn a new query language, build new connectors, put together new dashboards, and rewrite application code every two years. Database maintenance is already hard enough. Your database vendor should be taking work off your hands, not making it worse. Every hour spent working on your database is taking off from the core objective of the developer, which is building, running, and growing an application. To understand this is to respect the developer’s time and effort.

This is not only a theoretical concept but a foundational design principle that InfluxData seemed to miss.

InfluxData’s lack of focus confused their users (and hurt their market share)

InfluxData started with a great project (InfluxDB 1.x, a time-series database). Soon enough, their focus seemed to scatter, first by building the TICK stack (which was close to being an observability platform) and then with InfluxDB 2.x, with which they seemed to double down on prioritizing the platform vs. the database. Now, with Influx IoX (a.k.a. Influx 3.0), this has flipped once again, and they’re doubling down on the core database to abandon the platform they had built.

This ambiguous market positioning opened up the door for other, more focused solutions to emerge, such as Prometheus and Grafana, which ended up dominating the metrics and monitoring space.

itIt also didn’t help that the TICK stack, as it was originally conceived, was also sort of abandoned in the InfluxDB 2.0 rewrite process. Developers who had invested significant time and resources into integrating these tools into their systems were now told to migrate to something new for reasons that were not completely clear nor justified. It felt as if Influx kept busy looking for product-market fit in other places while forgetting about the users they already had.

Perhaps if InfluxData had decisively committed itself to the metrics and monitoring use case when it had the market advantage, it could have focused its core engineering resources to carve out a definitive niche in the sector. And if Influx had decided to focus on improving and maturing a database that was already quite great, TimescaleDB would probably not exist today (more below).

Now, it’s time for us to be fair. We have also been guilty of a lack of focus at times. In 2020, we built Promscale, our own observability tool built on TimescaleDB, to then deprecate it earlier this year. After re-evaluating our company priorities, we realized our mistake: we were not an observability company; there were great open-source solutions thriving in that space already; and most importantly, by dedicating core engineering efforts to Promscale, we were moving away from what we knew how to do best, which was helping developers build better applications.

There was one big difference here though: Promscale never made it to version 1.0 or out of beta.

InfluxDB users got penalized with too many product options

There’s something to be said about the value of simple choices. Databases are very complex pieces of software, and keeping things easy is not always possible, but as database companies, there’s a lot we can do to simplify decision-making for our users.

Simplicity as a value does not translate for InfluxData, starting with their product portfolio. As the company oscillated between roles and projects, never quite committing to a definitive path, they discharged the cognitive load onto the developer, who now has to navigate between InfluxDB OpenSource (which exists on its 1.x, 2.x, and 3.x versions), InfluxDB Cloud (which exists at 1.0 running Influx 1.0, and 2.0 running Influx 3.0), InfluxDB Cloud Serverless, InfluxDB Cloud Dedicated, InfluxDB Clustered, InfluxDB Cloud, InfluxDB IoX, InfluxDB Enterprise, InfluxDB Edge, etc. We might be missing some.

This is without even beginning to dive into the rabbit hole of trying to work out which versions (or features) are commercial and which are open-source.

InfluxDB is not PostgreSQL

You’re probably thinking, “Of course Timescale would say that,” but when you think about it, the particular challenges of time-series workloads gave birth to InfluxDB as a specialized technology based on the premise that time-series data was too much to handle for relational databases. The thing is that developers don’t want to use niche databases: they want to use PostgreSQL. Why specialize when you can generalize?

Databases are not just tools but entire ecosystems with distinct query languages, interfaces, and operational protocols. Filling your stack with niche databases means that you’ll end up spending so much time battling with new technologies that are still somewhat untested. And your data will be siloed and locked between different places. Joins won’t be possible, new use cases will rely on support in the database stacks, you’ll have trouble with technology not integrating with your database, and your operational overhead will multiply.

Building on PostgreSQL simplifies a developer's life. SQL’s universal acceptance ensures immediate productivity. You can join your time series data with any other table. You can run complex analytical queries. You can tap into a robust and reliable ecosystem with easy integrations and a wealth of ready-to-use tools and extensions. This environment is enriched by a global community, guaranteeing continuous support. Postgres isn't just another choice; it's a smart strategic move.

Some Things That InfluxDB Got Right

Despite the lengthy discussions on what InfluxData (the company) got wrong, there is one thing they got right, and that’s the original InfluxDB product. It’s only fair that we take some time to point out the area where it excels (even if you might have to traverse different versions, or query languages, to get there).

InfluxDB is good for IoT workloads where you’re ingesting millions of time series per second, with multiple labels and values per item (a wide-table schema). You want to read that data back or analyze each time series individually, that’s about where your workload ends.

Apart from the database, Telegraf deserves a distinctive mention—it's a fantastic tool that survived the Influx 2.x odyssey. Kudos to the InfluxData team for keeping it alive.

We also love the time to awesome concept. At least in the world before 3.x (and specially with 1.x), it was quite easy to get InfluxDB up and running. It's hard to build a data product that allows developers to realize value quickly, and Influx managed to do that. This is something that we take as inspiration in Timescale and that we're working to improve.

Final Reflection

Before we end up, we happen to have a personal story with InfluxDB that goes years back. Timescale first started as an IoT company, storing data from over 100,000 devices. We needed a place to store all this sensor data and picked an off-the-shelf time-series database to do so alongside our main PostgreSQL database.

But our experience was not great. We hit performance issues, we hit stability issues, we hit query language issues. Early on, we wanted to display metadata for all the online devices. But our data was now siloed: the device sensor data lived in the time-series database, and the device metadata lived in PostgreSQL. So even that simple request: “Show me all devices that are online right now,” which really should have just been a simple JOIN between two tables, required glue code and an engineering sprint to get done.

When we met with the time-series database company to talk about our pain points and see the future roadmap for the database, we learned that their plan was actually to build a complex stack that included data processing and visualization—they were essentially putting aside the well known, well-documented problems their database had. We didn’t need any of that other stuff. We just wanted a better database for time-series data.

In case it’s not obvious, that time-series database was InfluxDB. That conversation happened in May 2015. And if InfluxData had made better product and business decisions, Timescale would not exist today.

Throughout the years, we have made plenty of our own mistakes. The point of this post is not to point fingers or pretend that database companies are not allowed to make mistakes, pivot, or explore new ideas. It’s about the importance of listening to your developers and learning from past experiences. There’s a lot to be learned from the InfluxDB story, lessons that we’re reflecting on at Timescale. We’ve made some mistakes, but we continue to do better.

Wrap-Up

As these database discussions come and go, PostgreSQL's popularity continues to grow. We remain convinced that PostgreSQL is the bedrock on which modern applications can and will be built. Its proven reliability, versatility, rich ecosystem, and the power of SQL as a query language are very hard-to-beat combinations by any other emerging specialized databases. And Timescale is the missing ingredient that makes PostgreSQL ready for your time-series data workloads.

Originally posted

Oct 05, 2023

Last updated

Jul 03, 2024

Share

Subscribe to the Timescale Newsletter

By submitting you acknowledge Timescale's Privacy Policy.