Resume a Broken Database Migration in 15 Minutes

Recover from failed migrations safely with conditional recovery scripts, manual tracking fixes, and zero data loss strategies.

A production migration failed halfway through, and now your database is in an inconsistent state. Here's how to safely recover and complete it.

You'll learn:

  • How to identify where a migration stopped
  • Safe rollback strategies without data loss
  • Resuming from the exact failure point

Time: 15 min | Level: Intermediate


Problem: Migration Crashed Mid-Execution

Your migration started, then failed with a connection timeout or constraint error. Now you can't roll forward or back—half your tables are updated, half aren't.

Common symptoms:

  • Migration tool shows status as "running" but isn't
  • New code expects schema changes that don't exist
  • Database logs show partial transaction commits
  • Application throwing constraint or column-missing errors

Why This Happens

Most migration tools don't wrap DDL statements in transactions (especially in MySQL/PostgreSQL for certain operations). When a migration fails midway, previous statements are already committed, leaving you in a partial state.

Root causes:

  • Long-running migrations hitting timeouts
  • Foreign key constraint violations on live data
  • Disk space exhaustion during index creation
  • Network interruptions to database server

Solution

Step 1: Lock the Database

Prevent further damage while diagnosing:

-- PostgreSQL
SELECT pg_advisory_lock(123456);

-- MySQL
LOCK TABLES schema_migrations WRITE;

Expected: Query succeeds with no errors. This prevents other processes from attempting migrations.

If it fails:

  • Error: "Lock timeout": Another migration is stuck. Find and kill it: SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE '%migration%';

Step 2: Find the Failure Point

Check your migration tool's tracking table:

-- Rails/ActiveRecord
SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5;

-- Django
SELECT app, name, applied FROM django_migrations ORDER BY applied DESC LIMIT 5;

-- Flyway
SELECT installed_rank, version, description, success 
FROM flyway_schema_history 
ORDER BY installed_rank DESC LIMIT 5;

Look for: The last successfully applied version. The failed migration will either be missing or marked success = false.


Step 3: Inspect What Actually Changed

Compare expected vs actual schema state:

-- Check if column exists (PostgreSQL)
SELECT column_name, data_type 
FROM information_schema.columns 
WHERE table_name = 'users' AND column_name = 'new_field';

-- Check indexes
SELECT indexname FROM pg_indexes WHERE tablename = 'users';

-- Check constraints
SELECT conname, contype FROM pg_constraint 
WHERE conrelid = 'users'::regclass;

Document what you find: "Migration 20260215_add_user_roles created the roles table but failed before adding the foreign key."


Step 4: Create a Recovery Migration

Don't edit the failed migration—create a new one that continues where it stopped:

# db/migrate/20260216_resume_user_roles.rb
class ResumeUserRoles < ActiveRecord::Migration[7.1]
  def up
    # Skip already-applied parts
    unless table_exists?(:roles)
      create_table :roles do |t|
        t.string :name
        t.timestamps
      end
    end
    
    # Complete the failed steps
    unless foreign_key_exists?(:users, :roles)
      add_foreign_key :users, :roles
      add_index :users, :role_id  # This was never run
    end
  end
  
  def down
    remove_foreign_key :users, :roles if foreign_key_exists?(:users, :roles)
    remove_index :users, :role_id if index_exists?(:users, :role_id)
    drop_table :roles if table_exists?(:roles)
  end
end

Why conditional checks matter: This migration can run safely even if previous attempts partially succeeded. It won't fail on "table already exists" errors.


Step 5: Mark the Failed Migration as Done

Manually update the tracking table so the tool skips it:

-- Rails
INSERT INTO schema_migrations (version) VALUES ('20260215123456');

-- Flyway (mark as successful retroactively)
UPDATE flyway_schema_history 
SET success = true 
WHERE version = '20260215' AND success = false;

-- Django
INSERT INTO django_migrations (app, name, applied) 
VALUES ('myapp', '0042_add_user_roles', NOW());

Critical: Only do this after confirming Step 4's recovery migration handles everything.


Step 6: Run the Recovery Migration

# Rails
rails db:migrate

# Django
python manage.py migrate

# Flyway
flyway migrate

Expected: Your recovery migration runs successfully, completing all partial changes.

If it fails:

  • Error: "Duplicate key": The failed migration created some records. Add ON CONFLICT DO NOTHING to inserts.
  • Error: "Column already exists": Add more conditional checks in Step 4.

Step 7: Release the Lock

-- PostgreSQL
SELECT pg_advisory_unlock(123456);

-- MySQL
UNLOCK TABLES;

Verification

Test the complete schema:

# Run your test suite against the database
npm test

# Check application boots without errors
rails console
# or
python manage.py check

You should see: All tests pass, no schema-related errors in logs.


Using AI to Speed This Up

Modern AI coding assistants can analyze migration files and generate recovery scripts:

# Using Claude with your migration file
cat db/migrate/20260215_broken_migration.rb | \
  claude "This migration failed. Generate a recovery migration that safely completes it."

What AI can help with:

  • Parsing complex migration files to identify discrete steps
  • Generating conditional checks for each operation
  • Suggesting rollback strategies based on failure type
  • Creating test queries to verify schema state

What AI can't do:

  • Detect non-obvious data corruption
  • Decide which data to preserve during conflicts
  • Access your actual database to check current state

Best practice: Use AI to generate the recovery template, then verify each conditional against your actual database state from Step 3.


What You Learned

  • Migration tools don't always provide atomic DDL operations
  • Manual tracking table updates are sometimes necessary
  • Idempotent recovery migrations prevent re-failure
  • Always lock during recovery to prevent race conditions

Limitations:

  • This works for schema changes, not data migrations with complex logic
  • Some databases (MySQL with MyISAM) don't support transactional DDL at all
  • Very large tables may need online migration tools like gh-ost instead

When NOT to use this:

  • If you can afford to restore from backup and replay safely
  • If the migration involves tenant data requiring per-record validation
  • If you're unsure what state the database is in—get a DBA

Tested with Rails 7.1, PostgreSQL 16, MySQL 8.0 on production-scale databases