A PostgreSQL Developer’s Perspective: Five Interesting Patches From January’s Commitfest
January’s Commitfest has completed with 70 patches committed, 183 deferred to the next commitfest, 14 withdrawn, and 12 returned with feedback.
As always, there are a large number of interesting patches, and I have selected five patches to discuss today, one of which was reviewed last time. Three of these patches have been committed, one was withdrawn, and the last one was deferred to the next commitfest.
In most cases, I avoid selecting patches where I am a reviewer, but I have decided to make an exception here and discuss the 64-bit XID patch. In the interest of full transparency, I am a reviewer of the patch, and my discussion here is not intended to replace or extend my participation in that review process.
Committed Improvements in Performance and Ease of Use
Improve tab completion for ALTER FUNCTION
/ PROCEDURE
/ ROUTINE
This is perhaps a trivial improvement, but it addresses something I find annoying when working on prototyping systems. It also affects some production issues if people need to alter functions, etc., for production impacts.
This patch affects psql specifically and applies only to the standard command-line, scripting, and administration tool that ships with PostgreSQL. The psql client is one of the most scriptable database clients I have ever worked with. It can generate output in several formats (including HTML and LaTeX) and has several features specifically for use in scripting tools. Therefore, it’s my go-to tool for any administrative task for working with the database.
Why this matters
This is a small improvement, which eliminates a small annoyance. I am used to using \df
for autocomplete of functions and procedures. This is likely to lead to a faster resolution in some cases.
This patch may seem minor, but it represents an example of how PostgreSQL just keeps getting easier to use via the existing command line tools. PostgreSQL keeps getting better and better.
postgres_fdw
—use TABLESAMPLE
when analyzing foreign tables
PostgreSQL can access data managed or stored in other ways using “foreign data wrappers.” Foreign data wrappers are pieces of software that plug into PostgreSQL and allow PostgreSQL to access data not stored in PostgreSQL-managed relations.
PostgreSQL still needs statistical information about the table's contents to properly plan queries so that they can run as quickly as possible. Current versions of PostgreSQL gather such statistics by sampling rows on the client side, which means that PostgreSQL will query the remote server for all data, then use a subset of it to populate the statistics, discarding the rest. This makes the PostgreSQL foreign data wrapper less than ideal for large foreign tables.
A patch has now been committed that uses PostgreSQL’s TABLESAMPLE
option to request a statistical sample of the data. This improves network traffic and processing time. Unfortunately, as I understand the patch, it does not reduce disk I/O on the storing server, but this is still a massive improvement.
Why this matters
Distributed data environments, in my experience, become more common as data volumes grow. This means that data volume issues are fairly common in these environments. This patch improves performance and load in a couple of ways.
The first obvious issue is that network traffic is reduced. However, filtering is often less expensive close to the storage (as is downsampling). This is why proper use of aggregation often improves database load, even on highly loaded systems. While we don’t have real-world data yet on the impact we will see on real-world workloads, I expect this to be a very important step forward.
+infinity for dates and timestamps
This is another small improvement that highlights perpetual improvements in PostgreSQL. A small patch has been proposed and accepted, allowing ‘+infinity’ as a synonym for ‘infinity.’ This works for numeric, date, and timestamp values.
I originally selected this patch because my knowledge of the date type was, pardon the pun, out of date. I had missed that, in version 8.4, dates could also be infinite (prior, I had to use timestamps to get infinite values and had missed the change). Although this patch ended up being minor, it turned me to older improvements and a story of continuous improvement of PostgreSQL.
All the way back, at least to PostgreSQL 6.3, PostgreSQL has supported special values for timestamps for infinite values. The idea is that you can have an infinity timestamp that is later than all other timestamps and a -infinity timestamp earlier than any other timestamp.
These can often be used as sentinel values, for example, indicating an open-ended validity. While this is available for timestamps, only much later (in version 8.4) was similar functionality for the date type. I understand this was done during a series of general improvements and reworkings of the date type itself.
Finally, in the next version, we will be able to support more consistent notation for infinite values, allowing easier testing and software development.
Why this matters
PostgreSQL is improving even in corners that may seem exotic from the outside. This small change allows easier software development in cases where infinite values are used. It will be noticed.
Withdrawn Patch of Interest
New strategies for freezing, advancing relfrozenxid
early
One issue that users of large databases face is the dreaded transaction ID wraparound. This problem occurs when VACUUM
falls behind and cannot advance the counterpoints, which are known to be always in the past. That’s a simplification, but it generally describes the problem.
A patch set was proposed which was intended to address this problem by providing more efficient management of transaction IDs. Unfortunately, on review, a number of potential problems were discussed, and the patch was withdrawn.
This work direction is essential, particularly in combination with the 64-bit XID patch discussed below. I look forward to seeing more innovative approaches to this problem–possibly including storage approaches that might not face the problem at all.
It is the mark of an exceptionally professional and talented community that we can put patches through blistering reviews and that people can and do withdraw patches on their own accord when problems are pointed out. This sort of behavior is part of what makes PostgreSQL rock-solid. And withdrawal doesn’t mean another approach or perhaps further iteration won’t be submitted at some point.
Patch Moved to Next Commitfest
Add 64-bit XIDs into PostgreSQL 16
This is a second, much older patch set that helps to address a different aspect of transaction ID wraparounds. There is some disagreement in the community about why this patch is relevant but a general (though not complete) consensus that this is probably, at least in its outlines, a direction we will need to move as a database engine.
This patch set originated from Postgres Professional several years ago and is one that that company has perpetually tried to push into PostgreSQL. As a result, it represents a multi-year effort that, while it may have been ahead of its time, is rapidly becoming more relevant and urgently needed. I am proud to report that Timescale is now collaborating on this patch and trying to help get it through the review process.
My own view of the work has been somewhat skeptical since transaction ID wraparound problems right now mainly occur due to other things—such as autovacuum—not working as expected, and these malfunctions can have different impacts, too (I have faced transaction wraparound problems myself).
Whatever disagreements I have with the authors of the patch, it is important to note that 64-bit XIDs are likely to become increasingly crucial in the coming years. With every release, PostgreSQL becomes more capable, and hardware becomes more powerful.
One current source of bottlenecks for high-throughput workloads is concurrent transaction handling. Eliminating this bottleneck allows more data to be loaded faster, which in turn makes the hard limitations of 32-bit transaction IDs far more painful when they start to run out.
As a result, transaction ID wraparound problems develop more quickly and can become rapidly more severe, and experimental storage engines that eliminate the need for vacuuming are likely to add even more urgency as some bottlenecks may be eliminated, and transaction IDs might be used up at ever faster rates. While a few people insist that the only proper answer is to fix vacuuming, the fact is that an eventual shift to 64-bit transaction IDs would allow for workloads that aren’t currently possible.
I believe that the work of the community here needs to be to get this patch into an optimal shape (or at least into a shape where, despite the extra storage used to store transaction IDs, it is an improvement in every current workload) and committed. I hope this happens soon.
Concluding Thoughts: A Professional Community Improving a Rock-Solid Database
A major theme in this patch selection is an attention to detail in potential and actual improvements and a willingness to look past ego or pride in order to do what’s best for the community of this truly worldwide project.
Developers may spot small improvements that lead to a much better user experience and submit these changes. Others may try to fix long-standing problems only to eventually realize that the proposed solutions cause more problems and gracefully withdraw their suggestions. Even slow-moving patches can be carried forward by many people working together. This is why PostgreSQL continues to improve so much each time a major version is released.
As PostgreSQL evolves, you can add even more functionality to this beloved database. Explore TimescaleDB—it extends PostgreSQL with features like automatic time-based partitioning and indexing, continuous aggregations, columnar compression, and time-series functionality. And if you’re using a managed service for PostgreSQL, try Timescale—it’s free for 30 days, no credit card required.