High Availability

Private preview in ClickHouse Cloud

Managed Postgres offers different levels of high availability to match your durability and performance requirements. You can add one or two standby replicas when provisioning your database, or adjust this configuration later from the Settings page as needed.

High availability options

2 Standbys

With two standbys, two replica nodes are provisioned alongside your primary. Both standbys are the same size as the primary and either can take over if the primary fails.

By default, this configuration uses synchronous replication where the primary waits for acknowledgement from at least one standby before confirming writes. This provides stronger durability guarantees than asynchronous replication. Because only one acknowledgement is required (not both), the performance impact is less severe than synchronous replication with a single standby.

If you prefer higher write performance over the additional durability guarantee, you can switch to asynchronous replication.

1 Standby

With one standby, a replica node is provisioned alongside your primary. The standby is the same size as the primary and waits ready to take over if the primary fails.

By default, data is replicated to the standby using asynchronous replication. This means writes commit to the primary without waiting for acknowledgement from the standby. Asynchronous replication ensures that high availability doesn't slow down your writes due to additional network latency. However, it also means the standby might not have received the most recent transactions at the moment of a primary failure.

For most applications, this trade-off between performance and the small risk of losing very recent writes is worthwhile. If your application requires stronger durability guarantees, you can opt in to synchronous replication, though this will add latency to write operations.

No Standby

With this option, only a primary node is provisioned in your selected size. No standby node is created. Your primary node is still monitored for failures, but recovery may take longer depending on the nature of the problem since there's no replica ready to take over.

This configuration is best suited for development environments, testing, or non-critical workloads where some downtime is acceptable.

Standbys vs read replicas

Standbys and read replicas serve different purposes in Managed Postgres and are configured separately.

Standbys are dedicated exclusively to high availability and automatic failover. They replicate data from the primary using streaming replication and are always ready to be promoted if the primary fails. Standbys are not exposed for read queries.

Read replicas are designed for read scaling. They pull WAL (Write-Ahead Log) data from object storage and run in a separate network environment with their own connection endpoint. Read replicas allow you to offload read traffic from your primary without impacting HA guarantees.

Why standbys don't serve read queries

While some database providers expose hot standbys for read-only queries, Managed Postgres intentionally does not. Allowing read queries on standbys can compromise their primary purpose: being ready to take over instantly when the primary fails.

There are two main concerns:

WAL replay competition: Under write-heavy workloads, read queries on a standby compete with WAL replay for system resources. This competition can cause high replication lag, meaning the standby falls behind the primary. If a failover occurs while the standby is lagging, it won't have the most recent data and may not be ready to take over cleanly.
VACUUM interference: Long-running read queries on a standby can prevent VACUUM (and AUTOVACUUM) from cleaning up dead tuples on the primary. PostgreSQL cannot remove rows that an active query on any replica might still need to access. This can lead to table bloat and degraded performance over time.

By keeping standbys dedicated to failover, Managed Postgres ensures they are always synchronized and ready to take over with minimal data loss and downtime. For read scaling, use read replicas instead.

Handling failures

All Managed Postgres instances are continuously monitored for failures, regardless of whether high availability is enabled. In all cases, the system attempts to heal from failures automatically.

When standbys are available, automatic healing is faster and more straightforward. The system typically recovers within minutes by promoting a standby to primary. Without standbys, recovery may require manual intervention, significantly increasing the duration of any outage.