Migrating 30 servers from DigitalOcean to Hetzner without downtime — Unaib Amir
The TL;DR
Two months of planning, one weekend of cutover. 30+ servers moved. 99.97% uptime through the transition. Around $5,000/month in infrastructure savings post-migration.
This post walks through the order of operations that made the move possible without an outage.
Why move at all
[TODO during content phase — drop in real reasoning]
The plan, in order of operations
- Inventory. Catalog every server, its role, its hostname, its monitoring agents, its DNS dependencies.
- Hetzner provisioning. Spin up the new fleet in parallel with the old one, configured identically via Ansible.
- Database replication. Bring the new MariaDB online as a replica of the old. Verify lag stays under one second.
- Auth0 connection swap script. Pre-build the tool that will atomically move all 120 tenant connections in one transaction. (See the shallow-merge near-miss post for what almost went wrong here.)
- DNS prep. Lower TTLs to 60 seconds two weeks ahead of the cutover.
- Smoke runbook. Pre-written checks that confirm a healthy tenant on the new fleet, from login through course enrollment to certificate generation.
- Cutover weekend. Quiesce writes for the cutover window, run the connection swap, flip DNS, run smoke checks on a sample of tenants, monitor.
- Drain old fleet. Wait 7 days. Snapshot. Destroy.
What actually happened
[TODO — pull the real notes during content drafting]
Lessons
- [TODO]
- [TODO]
- [TODO]
What’s next
Watch the platform under steady-state load for a quarter, then revisit Hetzner cost optimisation (block storage tiering, network bandwidth pooling).
FAQ
Why move from DigitalOcean to Hetzner?
Cost predictability and dedicated CPU. DigitalOcean's pricing curve at our scale made Hetzner roughly 70% cheaper for equivalent or better hardware.
How do you maintain uptime during a fleet migration?
DNS-level traffic shifting, dual-running app servers behind a single load balancer for the cutover window, database replicated forward with logical replication, and a one-way Auth0 connection swap timed against the cutover.