Written by Carlo Mencarelli
When scaling PostgreSQL in production, there may come a time when you’ll face two problems:
You need a mechanism to ensure your production database stays up when there’s trouble so your users don’t face downtime.
Your workload is getting so heavy that you must diversify your reads and writes to avoid exhausting your machine and creating issues.
The good news is that replication in PostgreSQL can help you with both these problems. In this guide, we’ll review database replication, strategies for making your database highly available, and how Timescale makes it easy for you.
PostgreSQL replication is available in two distinct flavors: logical and physical. Both of these have pros and cons, as well as best practices. Let’s dive in.
Logical replication copies the data objects in the database and the changes that happen to them. It starts with a snapshot of the data on the source (publisher) and copies it to the destination (subscriber). After the snapshot is completed, changes are sent to the subscriber in real time and applied in the same order as the publisher—this guarantees transactional consistency.
The PostgreSQL documentation lists some typical use cases for logical replication. Some common ones in my experience include:
Replicating data between databases running different major versions of PostgreSQL
Consolidating multiple databases into a single database
Providing access to subsets of the data to different user groups
The benefit of using logical replication is that it can be used to replicate very granular pieces of data from the database. It’s also suitable for cascading replication or using a subscriber database as a publisher database, which might be relevant if you want to maintain geographically based read replicas for improved performance. Finally, as mentioned above, logical replication is a great way to migrate between major versions of PostgreSQL. One negative related to logical replication is that it is typically slower set up in larger datasets. On write-capable subscriber databases, there is a potential for a conflict to occur, which will end up stopping the replication and require manual intervention before it can be restarted.
Timescale uses logical replication when migrating from PostgreSQL to Timescale databases. Using pgcopydb, a logical decoding tool, Timescale can migrate databases with terabytes of data with almost no downtime, as we show in our blog: “Migrating a Terabyte-Scale PostgreSQL Database to Timescale With (Almost) Zero Downtime.”
Physical replication operates in two ways: file-based log shipping and streaming replication. Both approaches use the Write-Ahead-Logs (WAL) to achieve the replication.
File-based log shipping waits for the WAL file to be filled before sending it to the subscriber database. There is typically a lag due to the wait for a full WAL file, which leads to a window of potential data loss.
Streaming replication sends the WAL records to the subscriber as they are generated. This approach has a reduced window of data loss. However, it is more complex to set up. You also need to be aware of potential delays or stoppages in replication if the streaming connection is broken.
Timescale allows users to configure streaming replication easily and provides in-depth documentation on how to do so.
Replication plays a pivotal role in having a highly available PostgreSQL database—ensuring it stays online is likely critical to your success. The core concept for high availability (HA) involves maintaining a replica of your primary database on a separate server within your infrastructure.
In our docs, we discuss the need for backups, redundancies, replication, and failover. We also addressed the issues of database backups and disaster recovery in PostgreSQL in our blog, so if you have any questions on how this works in Timescale, I strongly recommend you read that post.
Now, turning the lens a bit more to achieving high availability, we need to talk about failover. In the case of a problem with the primary database, the system should move requests to a replica database or failover. In the event of a planned failover, the primary is gracefully shut down, and the replica seamlessly takes over as the new primary. In the case of an unplanned failover, the architecture will need to account for various failure scenarios to minimize downtime for the application. This is something that Timescale handles automatically for you any time any type of failure is detected.
While the concept seems straightforward, there are several intricacies to consider:
Lack of native failover: PostgreSQL lacks a built-in failover system. As a result, administrators must select and implement an external solution to handle failover scenarios effectively.
Reestablishing HA after failure: To maintain the high-availability system's integrity, it must be reestablished after a primary database failure. While this can be automated, it requires additional implementation time and ongoing maintenance to ensure its effectiveness. In the case of a true failover event, this can often be overlooked as well.
Automated failover and replication: Automation plays a crucial role in HA. When combined with regular backups, replicas can ensure minimal to no downtime, even in the case of a severe error. Automation guarantees that the failover process happens swiftly and accurately, reducing human intervention and potential errors.
Timescale offers an elegant solution for creating replicas and setting up failover mechanisms. It simplifies the process by providing an automated failover that automatically resets in the event of a primary switch. This feature greatly reduces the complexity associated with manual failover management.
We briefly discuss read replicas in an earlier article on logical and physical backups. Given the focus on replication and how it is related to the use of read replicas, this is a great opportunity to discuss the feature in more depth. While read replicas are often associated with HA, they are equally useful for distributing the workload of read queries. This means that as write queries are directed to the Write-Ahead Log (WAL) in the primary database, read queries can fetch data just as efficiently from a streaming replica.
Sending read requests to a replica server is akin to spreading the load across multiple shoulders. This distribution of the query load helps enhance the overall speed and responsiveness of your database. It acts as a safeguard against overloading the primary database server, ensuring that it remains available for critical write operations.
Another compelling reason to utilize read replicas is security. Read replicas are inherently read-only, which makes them perfect for delegating tasks to utilities that shouldn't have write permissions. They are also great for granting read access to users who may not ever need to write data to the database, such as business intelligence teams. By sending these requests to read replicas, the security and integrity of the database is maintained.
In PostgreSQL, creating a "hot standby" server is a straightforward process. The official PostgreSQL documentation provides clear guidance on setting up a hot standby server with a replica. The primary step is just to set the hot_standby parameter to true. Once this is done, the server becomes ready to handle read requests.
To fully leverage the potential of read replicas, you can manage which queries are directed to which server. This behavior can increase performance when combined with tuning aspects of the engine to improve the fetching of certain data types.
It's essential to monitor connections to ensure that responses are up-to-date and consistent. Additionally, any potential conflicts between the primary and standby servers must be addressed promptly to maintain data integrity. Read more about read scaling in Timescale.
If this looks like a time-consuming process, that’s because it is—and that’s when a managed database comes in handy. So, for those seeking a seamless solution, Timescale offers a more convenient approach, handling all the work under the hood required to switch to the replica when there is a failure in the primary database.
Let’s explain this in more detail. In Timescale, you’ll find two types of replicas:
High-availability replicas, whose main goal is to provide increased availability in case of failover, though reads can also be served from them. As mentioned, they are automatically promoted in the event of a failure.
Read replicas, whose main goal is read scaling. Depending on your workload needs, they can be different configuration sizes than the primary, and you can have multiple.
Our commitment to building a dependable and highly available database starts in our architecture: in the face of failures, Timescale automatically spins up a new compute node and reconnects it to the existing decoupled database storage, which itself is independently replicated for high availability and durability.
Even without a replica enabled, this cloud-native architecture can provide full recovery for many types of failures within 30-60 seconds, with more severe physical server failures often taking no more than several minutes of downtime to recover your database.
And database replication in Timescale is as easy as pressing a few buttons. Here’s how you can create a high-availability replica in Timescale:
Log in to your Timescale account and click the service you want to replicate.
Navigate to the ‘Operations’ tab and select ‘High availability.’
Check the pricing of the replica, and click ‘Add a replica.’ Confirm the action by clicking ‘Add replica.’
You can see the replicas for each service by clicking on the service name, navigating to the ‘Operations’ tab, and selecting ‘High availability.’ Replicas are not shown in the main ‘Services’ section, as they are not independent.
You can see connection information for the replica by navigating to the ‘Overview’ tab. In the ‘Connection info’ section, select the replica from the ‘Role’ drop-down menu to populate the section with the replica's connection details.
All set!
In this write-up, we delved into the world of database replication with PostgreSQL, uncovering its vital role in ensuring high availability, load distribution, and security. Armed with this knowledge, you can now make informed decisions about which replication systems align with your specific needs and requirements.
Whether you're looking to implement manual replication solutions or seeking the simplicity of automated management, PostgreSQL offers a versatile framework to cater to your database replication needs.
Timescale simplifies PostgreSQL replication, offering a seamless and automated approach that not only ensures high availability but also provides hot standbys for read operations. With Timescale, you can focus on optimizing your database's query distribution and performance, allowing you to harness the full potential of PostgreSQL without the complexities of manual management.
Sign up for Timescale's replication services here and discover how effortless database replication can be. Create a free account to elevate your PostgreSQL experience and enjoy the peace of mind that comes with enhanced high availability and workload distribution.