I spent 6 hours setting up my first CockroachDB cluster because the official docs skip the real-world problems you'll hit.
What you'll build: A 3-node CockroachDB cluster that survives server failures
Time needed: 45 minutes (not 6 hours like my first attempt)
Difficulty: Intermediate - requires basic Linux and networking knowledge
Here's what makes this different: I'll show you the exact network configuration that actually works, plus the two critical security steps that aren't obvious from the documentation.
Why I Built This
I needed a database that could handle my SaaS app going from 1,000 to 50,000 users without downtime. PostgreSQL with read replicas was getting complex, and I kept hearing about CockroachDB's "just works" distributed approach.
My setup:
- 3 Ubuntu 24.04 LTS servers on DigitalOcean (4GB RAM each)
- Private networking between nodes
- Load balancer requirement for client connections
- SSL certificates for production security
What didn't work:
- Following the basic single-node tutorial for production
- Using default firewall settings (blocked cluster communication)
- Skipping certificate setup (caused mysterious connection failures)
- Not configuring proper DNS resolution between nodes
Step 1: Prepare Your Ubuntu Servers
The problem: CockroachDB needs specific system requirements that Ubuntu doesn't have by default.
My solution: Install dependencies and configure system limits before touching CockroachDB.
Time this saves: Prevents the "why won't it start" debugging loop that cost me 2 hours.
Update System and Install Dependencies
Run this on all three servers:
# Update package lists
sudo apt update && sudo apt upgrade -y
# Install required packages
sudo apt install -y wget curl software-properties-common apt-transport-https ca-certificates
# Install NTP for time synchronization (critical for CockroachDB)
sudo apt install -y ntp
sudo systemctl enable ntp
sudo systemctl start ntp
What this does: CockroachDB requires synchronized clocks across nodes. The NTP service prevents clock skew issues that cause data consistency problems.
Expected output: You should see "ntp.service - Network Time Protocol daemon" as active when you run sudo systemctl status ntp.
My terminal after running the preparation commands - yours should look identical
Personal tip: "I learned the hard way that clock skew breaks CockroachDB's consensus algorithm. Always install NTP first."
Configure System Limits
CockroachDB needs higher file descriptor limits:
# Edit limits configuration
sudo nano /etc/security/limits.conf
# Add these lines at the end:
* soft nofile 35000
* hard nofile 35000
* soft nproc 35000
* hard nproc 35000
What this does: Prevents "too many open files" errors when your cluster handles high connection loads.
Reboot all servers to apply limits:
sudo reboot
Personal tip: "Skip this step and your cluster will crash under load. I found out during a client demo."
Step 2: Download and Install CockroachDB
The problem: Package managers don't have the latest CockroachDB version, and manual installation can be tricky.
My solution: Download directly from Cockroach Labs and install system-wide.
Time this saves: Avoids version compatibility issues with older packaged versions.
Download CockroachDB Binary
Run on all three servers:
# Download the latest binary (replace with current version)
wget https://binaries.cockroachdb.com/cockroach-v23.2.4.linux-amd64.tgz
# Extract and install
tar -xzf cockroach-v23.2.4.linux-amd64.tgz
sudo cp cockroach-v23.2.4.linux-amd64/cockroach /usr/local/bin/
sudo chmod +x /usr/local/bin/cockroach
# Verify installation
cockroach version
Expected output:
Build Tag: v23.2.4
Build Time: 2024/05/01 18:47:17
Distribution: CCL
Platform: linux amd64 (x86_64-pc-linux-gnu)
Go Version: go1.19.13
Successful installation - the version command should return without errors
Personal tip: "Always verify the installation works before moving to cluster configuration. This catches file permission issues early."
Step 3: Configure Network and Firewall
The problem: CockroachDB uses multiple ports that need specific firewall rules, and default Ubuntu firewall blocks everything.
My solution: Configure UFW with exact port requirements and test connectivity.
Time this saves: Prevents the "nodes can't find each other" problem that wasted 3 hours of my setup time.
Set Up Firewall Rules
On each server, configure UFW to allow CockroachDB traffic:
# Enable UFW
sudo ufw enable
# Allow SSH (don't lock yourself out)
sudo ufw allow ssh
# Allow CockroachDB ports
sudo ufw allow 26257/tcp # CockroachDB client connections
sudo ufw allow 8080/tcp # Admin UI
sudo ufw allow 26256/tcp # Inter-node communication
# Allow from specific IP ranges (replace with your server IPs)
sudo ufw allow from 10.0.0.0/8 to any port 26257
sudo ufw allow from 10.0.0.0/8 to any port 26256
# Check firewall status
sudo ufw status verbose
What this does: Opens the three essential ports CockroachDB needs while maintaining security by restricting access to your private network.
Personal tip: "Don't use sudo ufw allow 26257 without IP restrictions in production. I had bots trying to connect within hours."
Configure Hostnames and DNS
Edit /etc/hosts on each server to ensure nodes can find each other:
sudo nano /etc/hosts
# Add your server IPs (replace with actual IPs):
10.0.1.100 cockroach-1
10.0.1.101 cockroach-2
10.0.1.102 cockroach-3
Expected result: You should be able to ping each hostname from every server:
ping -c 3 cockroach-1
ping -c 3 cockroach-2
ping -c 3 cockroach-3
All three hostnames should respond to ping with 0% packet loss
Step 4: Generate Security Certificates
The problem: CockroachDB requires SSL certificates for secure cluster communication, but generating them correctly is confusing.
My solution: Use CockroachDB's built-in certificate generation with proper node names.
Time this saves: Prevents SSL handshake failures that are nearly impossible to debug without proper logging.
Create Certificate Authority
On your first server (cockroach-1), generate the CA:
# Create certificates directory
mkdir ~/certs ~/my-safe-directory
# Generate CA certificate
cockroach cert create-ca \
--certs-dir=~/certs \
--ca-key=~/my-safe-directory/ca.key
# Generate node certificates for all three nodes
cockroach cert create-node \
cockroach-1 \
cockroach-2 \
cockroach-3 \
localhost \
127.0.0.1 \
$(hostname) \
--certs-dir=~/certs \
--ca-key=~/my-safe-directory/ca.key
# Generate client certificate for root user
cockroach cert create-client \
root \
--certs-dir=~/certs \
--ca-key=~/my-safe-directory/ca.key
What this does: Creates a certificate authority and node certificates that include all possible hostnames your nodes might use.
Copy Certificates to Other Nodes
Copy the certificates to your other servers:
# From cockroach-1, copy to cockroach-2
scp ~/certs/* user@cockroach-2:~/certs/
# From cockroach-1, copy to cockroach-3
scp ~/certs/* user@cockroach-3:~/certs/
Set proper permissions on all servers:
chmod 700 ~/certs
chmod 600 ~/certs/*
Personal tip: "The certificate subject names must match how nodes connect to each other. Include both IP addresses and hostnames to avoid SSL verification failures."
Step 5: Start the CockroachDB Cluster
The problem: The node startup order matters, and the join flags need to be exactly right or nodes won't form a cluster.
My solution: Start the first node without join, then add others with specific join syntax.
Time this saves: Avoids the "nodes start but never join" scenario that confused me for 90 minutes.
Start First Node (cockroach-1)
# Start the first node
cockroach start \
--certs-dir=~/certs \
--advertise-addr=cockroach-1 \
--join=cockroach-1,cockroach-2,cockroach-3 \
--cache=.25 \
--max-sql-memory=.25 \
--background
# Check if it started
cockroach node status --certs-dir=~/certs --host=cockroach-1
Expected output: Should show one node with status "live"
Start Second Node (cockroach-2)
cockroach start \
--certs-dir=~/certs \
--advertise-addr=cockroach-2 \
--join=cockroach-1,cockroach-2,cockroach-3 \
--cache=.25 \
--max-sql-memory=.25 \
--background
Start Third Node (cockroach-3)
cockroach start \
--certs-dir=~/certs \
--advertise-addr=cockroach-3 \
--join=cockroach-1,cockroach-2,cockroach-3 \
--cache=.25 \
--max-sql-memory=.25 \
--background
Personal tip: "All nodes use the same join list. This lets any node bootstrap the cluster if others are down during restart."
Initialize the Cluster
From any node, run the initialization:
cockroach init --certs-dir=~/certs --host=cockroach-1
Expected output: "Cluster successfully initialized"
Success message confirming your cluster is ready for connections
Step 6: Verify Cluster Health
The problem: Just because nodes start doesn't mean the cluster is actually working correctly.
My solution: Run comprehensive health checks before declaring victory.
Time this saves: Catches configuration problems before you deploy applications.
Check Node Status
cockroach node status --certs-dir=~/certs --host=cockroach-1
Expected output: All three nodes should show "live" status with recent heartbeat times.
Test SQL Connectivity
# Connect to cluster
cockroach sql --certs-dir=~/certs --host=cockroach-1
# Inside the SQL shell, test basic operations:
CREATE DATABASE test_db;
USE test_db;
CREATE TABLE users (id INT PRIMARY KEY, name STRING);
INSERT INTO users VALUES (1, 'Test User');
SELECT * FROM users;
\q
Access Admin UI
Open your browser and navigate to: https://cockroach-1:8080
You'll see a security warning (expected with self-signed certificates). Click "Advanced" → "Proceed to cockroach-1".
Healthy cluster dashboard showing 3 live nodes and green status indicators
Personal tip: "The admin UI is your best friend for monitoring. Bookmark it and check the Overview tab regularly in production."
Step 7: Configure Systemd Services (Production Ready)
The problem: Manual startup doesn't survive server reboots, and you need proper service management for production.
My solution: Create systemd service files with proper dependencies and restart policies.
Time this saves: Prevents manual cluster recovery after maintenance windows.
Create Systemd Service File
On each server, create the service file:
sudo nano /etc/systemd/system/cockroachdb.service
Add this configuration (adjust paths and hostnames for each server):
[Unit]
Description=CockroachDB database server
Requires=network.target
After=network.target
[Service]
Type=notify
User=ubuntu
Restart=always
RestartSec=10
ExecStart=/usr/local/bin/cockroach start \
--certs-dir=/home/ubuntu/certs \
--advertise-addr=cockroach-1 \
--join=cockroach-1,cockroach-2,cockroach-3 \
--cache=.25 \
--max-sql-memory=.25
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGTERM
TimeoutStopSec=60
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
Personal tip: "Change the --advertise-addr value for each server. On cockroach-2 use cockroach-2, on cockroach-3 use cockroach-3."
Enable and Start Services
# Reload systemd configuration
sudo systemctl daemon-reload
# Enable auto-start on boot
sudo systemctl enable cockroachdb
# Stop manual processes first
pkill cockroach
# Start service
sudo systemctl start cockroachdb
# Check status
sudo systemctl status cockroachdb
Expected output: Service should show "active (running)" status.
Repeat this process on all three servers.
What You Just Built
You now have a production-ready CockroachDB cluster that automatically handles node failures, data replication, and load balancing across three Ubuntu 24.04 servers.
Key Takeaways (Save These)
- Certificate Planning: Include all possible hostnames (IP, DNS, localhost) in certificates or you'll get SSL errors later
- Firewall Configuration: CockroachDB needs ports 26257, 8080, and 26256 - don't forget inter-node communication
- Time Synchronization: Install NTP first. Clock skew breaks distributed consensus and causes weird data inconsistencies
- Join List Strategy: Use the same join list on all nodes so any node can bootstrap the cluster after maintenance
Your Next Steps
Pick one based on your experience level:
- Beginner: Set up connection pooling with PgBouncer for your applications
- Intermediate: Configure automated backups to cloud storage and test restore procedures
- Advanced: Implement geo-distributed clusters across multiple regions with zone configs
Tools I Actually Use
- Monitoring: CockroachDB's built-in Admin UI plus Prometheus + Grafana for production metrics
- Connection Pooling: PgBouncer configured for CockroachDB's transaction retry logic
- Backup Strategy: Built-in
cockroach backupto S3 with automated scheduling - Load Balancing: HAProxy with health checks on port 26257 for application connections
Common Gotchas I Learned the Hard Way
Clock Skew Issues: If you see "timestamp in the future" errors, check NTP sync with timedatectl status. Clock differences over 500ms break everything.
Connection Pool Settings: Don't use traditional PostgreSQL connection pool settings. CockroachDB needs pools configured for automatic transaction retries.
Schema Changes: Unlike PostgreSQL, schema changes in CockroachDB are online by default, but large table alterations can impact performance. Schedule them during low-traffic periods.
Ready to connect your applications? The cluster accepts standard PostgreSQL connections on port 26257 with SSL required.