Nov 28, 2024
Posted by
Chris Travers
November’s commitfest is over with 94 committed patches, 24 patches returned with feedback, 172 patches moved to the next commitfest, 4 rejected, and 13 withdrawn.
From a PostgreSQL developer’s perspective and beyond, there was a wide variety of improvements proposed, and while most of the committed patches appear less significant than those of the previous commitfest, a lot of work was actually done. Not every commitfest sees blockbuster patches committed, but there is often considerable progress in the review stage, which can be seen in future releases.
In this blog post, I have selected six patches, two committed, and four moved to the next commitfest. I did not select patches I am a reviewer for and focused on other patches I think are important for a wide range of users.
Both committed patches I selected are important security improvements that make certain scenarios far more manageable than before.
pg_hba.conf
The first patch allows regular expressions in the username field for the pg_hba.conf
. While this feature likely has many caveats, it solves some crucial problems in specific environments.
Normally, the preferred approach to group management with the pg_hba.conf
is to use role membership for selectors—this is still likely to be the primary method in most cases going forward. However, some specific situations can be significantly simplified by this patch, particularly trust-based Kerberos authentication from different domains (which we discussed in a previous post of this series).
By default, users connecting to PostgreSQL and authenticating via Kerberos use the Kerberos principal name: PRINCIPAL@DOMAIN, where PRINCIPAL is the name of the Kerberos account, and DOMAIN is the Kerberos domain name. These can look like email addresses.
In current versions of PostgreSQL, users do not have assumed access to database objects. These are usually assigned to roles that users are then granted use for. Access can be given to the public as well, which is the same as granting to all current and future roles.
These roles can also be used to grant the right to connect to the database via Kerberos or other authentication technologies in the pg_hba.conf
, which uses ordered rules to assign and restrict access to the database. For complex security environments, it is often vital to use the pg_hba.conf
and role-based access controls as separate, related layers in a system defending against unauthorized access. This is particularly important where external partners or departments might be granted access to only one database on a server.
This patch then enables the use of rules like “allow the sales department to access the sales database but no others” while still requiring different roles to access anything in that sales database.
This is another crucial step in further improving Kerberos (and hence Active Directory Integration) for PostgreSQL.
Currently, PostgreSQL requires special privileges to vacuum and analyze tables. Database or table owners are able to do these, as can superusers, but a patch has been committed that allows this to be delegated to other roles and users.
Many higher-volume databases require vacuuming on a schedule because autovacuum can become overwhelmed and bogged down. Some database administrators also prefer cost-based delays for autovacuum operations while not having them for manual vacuuming.
Currently, the scripts tend to have to run either as a database superuser or as a database or table owner. This is a very strong limitation, and it means that scripts and users that can vacuum tables can do far more than they need to, including destructive operations such as dropping tables.
Having this particular privilege available for delegation means that such scripts can be limited to only vacuuming and analyzing tables. This improves security in a critical database operation and reduces the opportunity for things to go wrong.
Efforts to bring transparent data encryption to PostgreSQL in various forms are proceeding. This tremendous initiative is unlikely to be committed in a single run. Instead, it will likely be a perpetual topic with many smaller patches which may be committed one at a time.
Transparent data encryption has many critical pitfalls that have to be addressed. These patches are likely to go through extensive, painstaking review before being accepted. That process may be long and drawn out.
Peter Eisentraut has proposed a patch for transparent column encryption. This is a work in progress and not directly part of the Transparent Data Encryption effort, but I am covering it here due to it being closely related and having many of the same problems to address, in particular key management problems.
The approach of this patch is to provide key management capabilities for client-side encryption. Column encryption keys are encrypted with column master keys. The database does not store the master keys, just the encryption keys, though they are encrypted with the master keys. This allows key management in a sane way. The client is then responsible for encrypting and decrypting the data.
Unlike transparent data encryption, which occurs on the page or block level and encrypts entire tables, encrypting particular columns can be used to protect very specific data against both unauthorized offline retrieval and bulk access.
For example, Payment Card Industry Data Security Standards (PCI-DSS) require not merely that credit card data is encrypted but also that someone cannot look at a list of credit card numbers (the data has to be accessed one record at a time). When both the client application and the storage use these technologies, it can be assured that a database administrator will not easily be able to pull a list of credit card numbers directly.
This sort of feature, if it moves forward, would help PostgreSQL compete in high-security PCI-DSS and other related environments.
This patch is not likely to get in right away, but it may become the foundation for more high-security features for PostgreSQL in the coming years.
David Christiansen has taken on the task of getting Transparent Data Encryption into PostgreSQL. This is a large patch set and has been broken down into a significant number of patches.
One of this endeavor's hardest and most critical areas is the key management problems. One patch set addresses this problem.
Transparent Data Encryption guarantees that you cannot access the files for a database without an externally managed encryption key. This ensures that filesystem-level or device-level storage cannot be stolen and then easily accessed.
Transparent Data Encryption is well-supported on some other relational database systems, but one hard part here is key management and ensuring that compromising information is never logged. Keys must never be logged, nor can information needed to access keys. Getting this part right is very important.
This set of patches seems to be moving forward. I believe Crunchy Data’s commitment to this effort is rock solid, and one way or another, we will get top-notch Transparent Data Encryption in PostgreSQL.
PostgreSQL uses a Simple Least Recently Used (SLRU) cache for transaction status information. Internally, transaction IDs (XIDs) are 64-bit integers in some places, but on disk are a 32-bit integer used both in tuple headers for xmin
and xmax
fields and an offset for the files in the pg_xact
directory for on-disk lookups for transaction status. The SLRU cache uses 32-bit integers internally.
A new patch moves the SLRU in-memory format from 32-bit to 64-bit format.
There is slow progress toward getting all transaction IDs moved to a 64-bit format across the board. This is important because PostgreSQL and hardware are both becoming more capable, and as a result, the risks of transaction ID wraparound-related outages are slowly increasing. A move to 64-bit transaction ID is not a magic bullet for these problems as underlying causes of the wraparound—this represents an important step to this effect.
Another vital aspect of this patch is that it removes a critical 32-bit/64-bit transaction ID conversion point. This provides a long-term reduction in the potential footprint of this subsystem as a source of serious bugs, including those that can result in data loss.
This patch is awaiting review. Robert Haas and others have expressed a strong interest in it. I believe it will move forward.
One major challenge with logical replication is that it only applies to INSERT
, UPDATE
, DELETE
, and TRUNCATE
commands and that these have problems with changes to database schemas. This is one major issue and requires careful handling and coordination of database schema changes across systems. A patch has been submitted and is awaiting changes that replicate Data Definition Language (DDL) as well.
Logical replication poses a significant number of challenges for managing database schemas. Sometimes this can result in subscribers and publishers even being run by different parts of an organization. Human coordination over such changes is tedious and complex and can result in stale replicated data or operational problems.
Additionally, running the same DDL statements on the replica is not necessarily safe. If a DDL statement references any volatile constant (such as ‘today’::date
) or function (such as now()
), there is no guarantee that the replicated table will have the same constraints. This can also break logical replication.
The November commitfest provided a number of small yet interesting changes and progress on larger projects and patch sets. The incremental security improvements mentioned are exciting and will help make PostgreSQL more competitive in many environments. Additionally, we see slow but solid progress on several larger projects.
As PostgreSQL continues to evolve, you can add even more functionality to this beloved database. Explore TimescaleDB—it extends PostgreSQL with features like automatic time-based partitioning and indexing, continuous aggregations, columnar compression, and time-series functionality. And if you’re using a managed service for PostgreSQL, try Timescale—it’s free for 30 days, no credit card required.