ClickhouseReplication Monitoring

Replication Monitoring

UptimeDock provides comprehensive monitoring for ClickHouse replicated tables. Track replication health, lag, queue sizes, and get instant alerts when issues arise.

Overview

ClickHouse supports data replication through the ReplicatedMergeTree family of table engines. Replication provides high availability and fault tolerance by maintaining identical copies of data across multiple servers.

UptimeDock automatically monitors all replicated tables in your ClickHouse instance, providing:

  • Real-time health status – Instantly see if replicas are healthy or experiencing issues
  • Replication lag tracking – Monitor how far behind replicas are from the leader
  • Queue monitoring – Track pending inserts and merges in the replication queue
  • ZooKeeper status – Detect ZooKeeper connectivity issues
  • Lost parts detection – Identify data integrity issues

Automatic Detection

Replication monitoring is automatically enabled when UptimeDock detects replicated tables in your ClickHouse instance. There's no additional configuration required.

If your instance has at least one table using a ReplicatedMergeTree engine (or any of its variants like ReplicatedReplacingMergeTree, ReplicatedSummingMergeTree, etc.), the replication monitoring features will appear automatically in your reports.

Required Permissions

To monitor replication status, your monitoring user needs SELECT access to thesystem.replicas table. See our Database Configuration guide for setup instructions.

Replication Summary

The Replication tab in the main report view provides a high-level overview of your replication status. This summary shows:

  • Replicated Tables – Total number of replicated tables with healthy/unhealthy breakdown
  • Max Delay – The longest replication lag across all tables
  • Read-only Tables – Number of tables in read-only mode (indicating issues)
  • Lost Parts – Total count of lost data parts across all tables
Replication Summary

A green dot next to the Replication tab indicates all replicated tables are healthy. A red dot indicates one or more tables have issues that need attention.

Database Replication View

Click on a database name to open its details, then navigate to the Replication tab. This view shows all replicated tables within that specific database:

  • Table Name – Name of the replicated table (clickable to view details)
  • Status – Healthy or Unhealthy indicator
  • Replicas – Number of active replicas out of total (e.g., "2 / 2")
  • Delay – Current replication lag in seconds
Database Replication View

Table Replication Details

For the most detailed view, click on a table name to open its details and navigate to the Replication tab. This provides comprehensive information about the table's replication status:

MetricDescription
StatusOverall health status (Healthy/Unhealthy)
Replica NameUnique identifier of this replica in the cluster
LeaderWhether this replica is the current leader (Yes/No)
Read-onlyWhether the table is in read-only mode (indicates ZooKeeper issues)
Active ReplicasNumber of replicas currently active out of total configured
Absolute DelayTime behind the leader in seconds
Queue SizeNumber of operations waiting in the replication queue
Inserts in QueueNumber of pending INSERT operations
Merges in QueueNumber of pending MERGE operations
Last Queue UpdateTime since the replication queue was last processed
Table Replication Details

Health Indicators

Throughout the interface, you'll see colored dots indicating replication health:

  • Green – All replicated tables are healthy
  • Red – One or more tables have replication issues

These indicators appear on:

  • Database names in the Databases tab
  • Table names in the Tables tab
  • Replication tab triggers in popups
  • Main Replication summary tab
What Makes a Table Unhealthy?

A replicated table is considered unhealthy if any of these conditions are true:

  • Table is in read-only mode
  • There are lost parts
  • ZooKeeper exceptions are present
  • Replication lag exceeds acceptable thresholds
  • Not all replicas are active

Replication Alerts

Set up alerts to be notified immediately when replication issues occur. Replication alerts can be created from the Alerts tab in any table's detail popup.

Available Alert Types

Alert TypeDescriptionCondition
Replication UnhealthyTriggers when table becomes unhealthyNo condition (boolean)
Replication LagTriggers when replication delay exceeds thresholdSeconds (e.g., > 60s)
Replication Lost PartsTriggers when lost parts are detectedCount (e.g., > 0)
Replication ReadonlyTriggers when table enters read-only modeNo condition (boolean)
Replication Queue SizeTriggers when queue size exceeds thresholdCount (e.g., > 100)
Replication Active ReplicasTriggers when active replicas fall below thresholdCount (e.g., < 2)
Replication ZooKeeper ErrorTriggers when ZooKeeper exceptions occurNo condition (boolean)
Replication Not ExistsTriggers when a replicated table disappearsNo condition (boolean)
Best Practice

We recommend setting up at least these alerts for critical replicated tables:

  • Replication Unhealthy – Catch-all for any replication issues
  • Replication Lag > 300s – Alert when replica falls more than 5 minutes behind
  • Replication Lost Parts > 0 – Immediate alert for data integrity issues

For more information on setting up alerts, see our Alerts & Notifications guide.