Written by Chris Engelbert and Carlota Soto
When setting up your PostgreSQL database, even before you start ingesting data, you’re confronted with a few important design decisions regarding schema design. Settling on a database schema is akin to an artist choosing their canvas before starting to paint: the shape of your canvas, or in this case, your table layout, will significantly influence how your database shapes up, affecting your performance, usability, and maintainability.
There are two major table design options for your PostgreSQL database: narrow and wide. In this blog post, we look at both and explain why and how you would use them while also introducing a third option (medium layout) as a middle-term between the traditional wide and narrow designs.
Let’s start by describing the differences between narrow and wide tables in a database schema and then discuss the pros and cons of each option.
In a schema design, narrow tables are characterized by having few data value columns, meaning that, for each row in the table, there aren't many different types of information being stored. For instance, a narrow table might simply store a timestamp and a singular measurement, like temperature.
However, it's essential to understand that having "few data value columns" does not equate to "few columns in total." A narrow table can still have multiple columns that serve other purposes. For example, columns could be present to categorize or label each row, associating it with a specific device from which the data came, the metric being measured or assigning it to a particular time series. So, while the core data in a narrow table might be simple, the table can still be structured to offer context or classification for each data point.
Here is an example of a narrow table:
CREATE TABLE narrow_table (
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
point_id UUID NOT NULL DEFAULT gen_random_uuid(),
device_id UUID NOT NULL,
metric TEXT NOT NULL,
value DOUBLE PRECISION NOT NULL
);
-- Example insert statement for the given data:
INSERT INTO narrow_table (created, point_id, device_id, metric, value)
VALUES ('2022-01-01 00:00:00.0+00', '123', '10', 'temp', 24);
As you can see, the row has a single column that holds the value: in this case, a decimal value. This example also uses one column for the metric name (in this example, “temp” for temperature). It could also be anything stored as a decimal value.
If you want to use other data type values, you could create a different table to store a text string or whatever other type. Alternatively, you often see JSONB as the data type for the value column. That way, you can store any valid JSON value in the column.
CREATE TABLE narrow_table (
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
point_id UUID NOT NULL DEFAULT gen_random_uuid(),
device_id UUID NOT NULL,
metric TEXT NOT NULL,
value JSONB NOT NULL
);
The wide table layout is the exact opposite of the narrow layout, meaning that you’ll end up with a lot of columns. Most often, it’s one column per available metric. Depending on the number of metrics you have, wide tables can get—as the name indicates—very wide, very quickly. It is common to see 200 or more columns in a single table.
A simple example of a wide table could be something like this:
CREATE TABLE measurement_table (
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
point_id UUID NOT NULL DEFAULT gen_random_uuid(),
device_id UUID NOT NULL,
temp DOUBLE PRECISION,
hum DOUBLE PRECISION,
co2 INTEGER,
wind_speed INTEGER,
wind_direction INTEGER
);
-- Example insert statement for the given data:
INSERT INTO measurement_table (created, point_id, device_id, temp, hum, co2, wind_speed, wind_direction)
VALUES ('2022-01-01 00:00:00.0+00', '123', '10', 24.7, 57.1, 271, NULL, NULL);
Based on your use case, the value columns may be nullable or not. That mostly depends on how data is delivered. For example, if data is provided simultaneously and can be stored in one row, or if data becomes available at different times, and only one or more columns are used each time.
In the above example, the three values for temperature, humidity, and CO2 are provided together, while wind speed and wind direction would be in a separate data packet, hence the NULL
values.
When designing your database schema, the narrow table layout is great if you have a low cardinality of metrics and you know that upfront, along with all their data types. If you don’t know this, you can remove the metric name as a column and create one table for each metric type, somehow mangling the metric into the table name like this:
Table: temperature
| point_id | device_id | value |
| 123 | 10 | 24.7 |
Table: humidity
| point_id | device_id | value |
| 123 | 10 | 57.1 |
This example includes independent tables for the temperature and humidity metrics. That way, you can think about the data type of every metric independently, and the table design becomes much easier.
Extending the layout with additional metrics is easy: create an additional hypertable matching the metric’s data type, and you’re done.
The previous advantage, however, can also turn into a disadvantage. If you have a high number of different metrics, the number of individual tables could become a hassle to manage. While PostgreSQL doesn’t have an issue with many tables, humans often do.
Additional complexity may come in if you want to split those tables further down, for example, having a temperature table per customer—also known as multi-tenancy. You could handle this by mangling the name (<<customer_id>>_temperature) or using a lookup table that maps the customer and metric to a generated table name. It could get tedious, though.
Another way to mitigate complexity is using JSONB as the value column’s data type. This allows you to unify all metrics per customer into a single table, adding back the metric column. Remember, you still need to understand what data type the value will be when queried since JSONB might force you to cast the resulting value into the data type your application expects (for example, JSON doesn’t use real integers, only numbers according to IEEE 754).
Last but not least, querying many different metrics at the same time can mean you need to use a lot of JOIN or UNIONs, especially when every metric has its own table. While it doesn’t necessarily slow down the query, it can be tedious to write or generate the queries. Plus, every JOIN operation increases the input/output (I/O) load on the system when preparing the result. All this becomes more likely when the cardinality of metrics keeps increasing.
The wide table is a great choice for your schema design if you know all potential metrics upfront. This is often the case with systems that have been in production for a long time and data that will be migrated over. If new columns need to be added only at a low rate, it may be a viable option, but adding a column may be a heavy operation.
Besides that, supporting multi-tenancy is simple, adding a new wide table per customer. (Still, remember, when adding a new column, you may have to add it to all of the customer’s tables too, which can be a tedious process.)
Querying data is simple: we don’t have to cast data as they’re already stored in their native data types, and there is no need to look up the column to query since there is one column per metric. Also, querying a subset of the columns is very efficient, especially when chunks are compressed into columnar representation.
Querying data from multiple customers is quick. Just JOIN and UNION the different intermediate queries together, and you’re done.
While the wide table layout is a common choice for beginners, it is probably the most complicated to manage and extend over time.
The biggest issue with the wide table is adding more metric columns. Keeping track of things in systems that are constantly growing with new metrics can be challenging.
Also, while dropping a column because you don’t need the metric anymore is an immediate operation (the column will just be hidden from the query engine), it doesn’t immediately remove the content from disk.
The rows will be rewritten whenever that row is updated, which will most likely never happen in time-series data, leaving the dangling data around forever. Adding NULLABLE columns is simple, the same as DEFAULT values. Non-updated rows will be evaluated at query time.
Advantages of a narrow table layout
Works well for low cardinality of known metrics and data types.
Allows individual data type consideration for each metric.
Simplifies table design with independent tables for each metric.
Easy to extend with new metrics by adding a new table.
Disadvantages of a narrow table layout:
Managing a high number of tables for different metrics can be challenging.
Multi-tenancy (e.g., a table per customer) adds complexity.
Using JSONB requires casting values, impacting data type clarity.
Querying multiple metrics involves multiple JOINs or UNIONs, complicating query writing.
Implies increased I/O load due to JOIN operations, especially with growing metric cardinality.
Advantages of a wide table layout:
Works well when all potential metrics are known in advance.
Simplifies multi-tenancy with a new table per customer.
Native data types eliminate the need for casting during queries.
Efficient querying for subsets of columns, especially with columnar compression.
Faster queries across multiple customers using JOIN and UNION.
Disadvantages of a wide table layout:
Challenging to manage and extend over time, especially for beginners.
Adding new metric columns can be hard in dynamic systems.
Dropping a column doesn't immediately free up disk space, leaving unused data.
Evaluates non-updated rows with DEFAULT values at query time, affecting performance.
There’s a middle-ground alternative between the two narrow and wide models when designing your database schema: the medium table layout.
The basic design principle is to create one column for each necessary data type:
CREATE TABLE medium_table (
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
point_id UUID NOT NULL DEFAULT gen_random_uuid(),
device_id UUID NOT NULL,
metric TEXT NOT NULL,
int_value INTEGER,
float_value DOUBLE PRECISION
);
-- Example insert statement for the given data:
INSERT INTO medium_table (point_id, device_id, metric, int_value, float_value)
VALUES ('123', '10', 'temp', NULL, 2);
To elaborate on the difference between this layout and the previous ones:
Narrow tables typically have one or very few columns representing data values (e.g., metric and value).
Wide tables have multiple columns, with each column representing a distinct metric or data point. If we had separate columns for temperature, humidity, pressure, etc., in the previous table, that would be a wide table layout.
The medium table above sits in between the narrow and wide layouts. It has columns representing different data types or categories of metrics but retains some level of generic structure. In the provided table, there's a separation between int_value and float_value, but the actual metric being stored (e.g., temperature, humidity) is still identified by the metric column. This offers more flexibility than a narrow table but doesn't go full out with distinct columns for each metric like a wide table.
If you need more data types, you can just add more columns. This is a very convenient setup if you know all the expected data types, which is fairly easy compared to “knowing all possible metrics in advance.” However, you must know which metric type would be found in which specific data type column.
The medium table layout has one row with one column per needed data type. That means you only need one column defining what kind of data is stored in that specific row. This may need an additional lookup table, not only defining the metrics but also which data type column needs to be queried for each. For a low, static number of metrics, you could handle this in the application layer.
This layout provides most of the benefits of the narrow design, being easy to extend with additional metrics, as long as we don’t want to introduce a new data type. But even then, you only need to add a new column to the table layout. It depends on the existing number of rows in your hypertable as to how heavy an operation like that is.
The medium layout is a very practical design choice if you are sure about the different data types that can occur in the future but are not yet certain of all potential metrics. This is a common situation in IoT use cases, where you add new types of metrics frequently. While the number of metrics increases, and you have to add them to your lookup mechanism (metric to data type column), the table's design doesn’t change, and you can keep adding new metrics without touching the layout.
Multi-tenancy is easily achieved by creating one hypertable per customer and storing all corresponding data in that particular table. If you need to query data for multiple customers, you can JOIN and UNION ALL the data together and get a single result set.
It is also straightforward to query multiple metrics simultaneously since all data resides in the same table. If you want to retrieve data from multiple different data types, though, you may end up using CASE WHEN clauses and filling result columns with NULLs when no value is available for a specific metric but other queried ones. This would need to be handled in the application layer.
There are quite a few bits to consider when making a decision. While it is possible to migrate from one layout to another, the process can be quite tedious, especially with a growing number of records being stored. The following table gives you a quick overview of the key pros and cons of every option:
Narrow Table Layout | Medium Table Layout | Wide Table Layout | |
---|---|---|---|
Ease of Use | Easy | Easy | Somewhat easy |
Cost of Up-Front Design | Low | Somewhat low | Somewhat high |
Extensibility | Easy | Somewhat easy | Somewhat complicated |
Multi-Tenancy | Somewhat complicated | Easy | Easy |
Let’s break them down.
Consider using narrow tables in your database schema if the following applies:
You have a low cardinality of metrics. Narrow tables thrive when you know your metrics and their data types upfront. It's simpler to manage and provides a clear layout.
Your metrics are evolving. If you’re unsure about the range of metrics you might handle, you might enjoy the flexibility of creating separate tables for each metric type.
You value extensibility. As your system grows and requires new metrics, you can just introduce a new table tailored to that metric's data type.
However, be cautious if the following applies:
You anticipate a surge in metrics. Managing many tables can be a hassle.
You require complex multi-tenancy. Creating a unique table for each metric-customer combination can become tedious.
You'll query across multiple metrics frequently. Crafting queries that touch on many metrics may necessitate multiple JOINs or UNIONs, potentially complicating your SQL and putting more strain on your system.
Consider using wide tables if the following applies:
Your metrics are established. If you have a clear understanding of all the metrics you’ll ever need, wide tables offer a structured and predictable layout.
You need simplified multi-tenancy. With one table per customer, handling data across multiple tenants becomes straightforward.
Your queries are column-specific. Since each metric has its own column, queries are direct and efficient.
Similarly as before, be cautious if the following applies:
You anticipate new metrics often. Constantly adding columns can be challenging, especially in growing systems.
Your system will undergo frequent changes. Regularly adding columns to all customer tables can become a maintenance headache.
Lastly, consider using medium tables if:
You want a balanced approach. If you're looking for something between the granularity of narrow tables and the structure of wide ones, medium tables offer a good compromise.
You're certain about data types. If you have a clear idea about the data types but are unsure of the metrics, medium tables provide flexibility.
You prioritize straightforward multi-tenancy.
Let’s have a quick look at how Timescale’s compression plays with the different table layouts. Compression is one of the essential features of Timescale, allowing to reduce database size by up to 10x.
P.S. If you don’t know what Timescale is, check out this article. It makes PostgreSQL faster by adding automatic partitioning to your large tables, query planning improvements, and many other things.
Generally speaking, all layouts are fully supported when using Timescale compression.
The narrow table layout is straightforward to manage: since no columns are to be expected at a later point in time (given you decided early on if you want to use the metric column or table name setup), no special considerations are to be expected with compression. You can also expect a really good compression ratio since the records are very uniform and, therefore, compress extremely well.
Thanks to the DML capabilities of Timescale compression, medium and wide table layouts also play nicely with compression. If you have to add new columns, you can add them directly.
And as a final note, columns with NULL compress really well: an often-seen worry for many people is that the number of NULL values in medium or wide layouts creates a lot of unnecessary bloat in the hypertable. But due to the way compressed chunks are designed, NULL values compress to almost nothing.
Your use case characteristics and your own priorities will determine which table layout you ultimately choose for your schema design. There is no “one-size-fits-all” layout. Look at your specific use case and decide on the pros and cons of the different layout aspects.
Maybe you have different use cases with different benefits using different layouts. You may, for example, want to store IoT metrics from sensors differently from your infrastructure’s observability data.
While in the wild, the medium table layout is probably the most uncommon one, it is often an excellent trade-off between an easy-to-grasp design, future extensibility, and a quick start. Some lookup mechanism is necessary to understand what metric will be available in what data type column.
If you know the exact metrics you want to store, the wide table layout may be a great choice. It provides a quick setup, great query speeds (especially with compressed enabled), and common queries, basically what your typical relational database query looks like. Finally, no lookup tables are necessary.
If you want the most straightforward design choice and the highest isolation of different types of metrics, the narrow table layout is yours. Given that there are no changes in the table’s layout to be expected, it is the easiest to design since you probably know exactly what the data type for a specific metric (or set of metrics sharing the same data type) looks like. With a high cardinality of metrics, a lookup mechanism may be necessary to understand what table to query (especially with name mangling).
Whatever you do, just remember one thing: it is possible to switch from one layout to another, but migrating isn’t simple. If you’re unsure about the future, consider one of the simpler extensible layout options.