A production migration failed halfway through, and now your database is in an inconsistent state. Here's how to safely recover and complete it.
You'll learn:
- How to identify where a migration stopped
- Safe rollback strategies without data loss
- Resuming from the exact failure point
Time: 15 min | Level: Intermediate
Problem: Migration Crashed Mid-Execution
Your migration started, then failed with a connection timeout or constraint error. Now you can't roll forward or back—half your tables are updated, half aren't.
Common symptoms:
- Migration tool shows status as "running" but isn't
- New code expects schema changes that don't exist
- Database logs show partial transaction commits
- Application throwing constraint or column-missing errors
Why This Happens
Most migration tools don't wrap DDL statements in transactions (especially in MySQL/PostgreSQL for certain operations). When a migration fails midway, previous statements are already committed, leaving you in a partial state.
Root causes:
- Long-running migrations hitting timeouts
- Foreign key constraint violations on live data
- Disk space exhaustion during index creation
- Network interruptions to database server
Solution
Step 1: Lock the Database
Prevent further damage while diagnosing:
-- PostgreSQL
SELECT pg_advisory_lock(123456);
-- MySQL
LOCK TABLES schema_migrations WRITE;
Expected: Query succeeds with no errors. This prevents other processes from attempting migrations.
If it fails:
- Error: "Lock timeout": Another migration is stuck. Find and kill it:
SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE query LIKE '%migration%';
Step 2: Find the Failure Point
Check your migration tool's tracking table:
-- Rails/ActiveRecord
SELECT * FROM schema_migrations ORDER BY version DESC LIMIT 5;
-- Django
SELECT app, name, applied FROM django_migrations ORDER BY applied DESC LIMIT 5;
-- Flyway
SELECT installed_rank, version, description, success
FROM flyway_schema_history
ORDER BY installed_rank DESC LIMIT 5;
Look for: The last successfully applied version. The failed migration will either be missing or marked success = false.
Step 3: Inspect What Actually Changed
Compare expected vs actual schema state:
-- Check if column exists (PostgreSQL)
SELECT column_name, data_type
FROM information_schema.columns
WHERE table_name = 'users' AND column_name = 'new_field';
-- Check indexes
SELECT indexname FROM pg_indexes WHERE tablename = 'users';
-- Check constraints
SELECT conname, contype FROM pg_constraint
WHERE conrelid = 'users'::regclass;
Document what you find: "Migration 20260215_add_user_roles created the roles table but failed before adding the foreign key."
Step 4: Create a Recovery Migration
Don't edit the failed migration—create a new one that continues where it stopped:
# db/migrate/20260216_resume_user_roles.rb
class ResumeUserRoles < ActiveRecord::Migration[7.1]
def up
# Skip already-applied parts
unless table_exists?(:roles)
create_table :roles do |t|
t.string :name
t.timestamps
end
end
# Complete the failed steps
unless foreign_key_exists?(:users, :roles)
add_foreign_key :users, :roles
add_index :users, :role_id # This was never run
end
end
def down
remove_foreign_key :users, :roles if foreign_key_exists?(:users, :roles)
remove_index :users, :role_id if index_exists?(:users, :role_id)
drop_table :roles if table_exists?(:roles)
end
end
Why conditional checks matter: This migration can run safely even if previous attempts partially succeeded. It won't fail on "table already exists" errors.
Step 5: Mark the Failed Migration as Done
Manually update the tracking table so the tool skips it:
-- Rails
INSERT INTO schema_migrations (version) VALUES ('20260215123456');
-- Flyway (mark as successful retroactively)
UPDATE flyway_schema_history
SET success = true
WHERE version = '20260215' AND success = false;
-- Django
INSERT INTO django_migrations (app, name, applied)
VALUES ('myapp', '0042_add_user_roles', NOW());
Critical: Only do this after confirming Step 4's recovery migration handles everything.
Step 6: Run the Recovery Migration
# Rails
rails db:migrate
# Django
python manage.py migrate
# Flyway
flyway migrate
Expected: Your recovery migration runs successfully, completing all partial changes.
If it fails:
- Error: "Duplicate key": The failed migration created some records. Add
ON CONFLICT DO NOTHINGto inserts. - Error: "Column already exists": Add more conditional checks in Step 4.
Step 7: Release the Lock
-- PostgreSQL
SELECT pg_advisory_unlock(123456);
-- MySQL
UNLOCK TABLES;
Verification
Test the complete schema:
# Run your test suite against the database
npm test
# Check application boots without errors
rails console
# or
python manage.py check
You should see: All tests pass, no schema-related errors in logs.
Using AI to Speed This Up
Modern AI coding assistants can analyze migration files and generate recovery scripts:
# Using Claude with your migration file
cat db/migrate/20260215_broken_migration.rb | \
claude "This migration failed. Generate a recovery migration that safely completes it."
What AI can help with:
- Parsing complex migration files to identify discrete steps
- Generating conditional checks for each operation
- Suggesting rollback strategies based on failure type
- Creating test queries to verify schema state
What AI can't do:
- Detect non-obvious data corruption
- Decide which data to preserve during conflicts
- Access your actual database to check current state
Best practice: Use AI to generate the recovery template, then verify each conditional against your actual database state from Step 3.
What You Learned
- Migration tools don't always provide atomic DDL operations
- Manual tracking table updates are sometimes necessary
- Idempotent recovery migrations prevent re-failure
- Always lock during recovery to prevent race conditions
Limitations:
- This works for schema changes, not data migrations with complex logic
- Some databases (MySQL with MyISAM) don't support transactional DDL at all
- Very large tables may need online migration tools like
gh-ostinstead
When NOT to use this:
- If you can afford to restore from backup and replay safely
- If the migration involves tenant data requiring per-record validation
- If you're unsure what state the database is in—get a DBA
Tested with Rails 7.1, PostgreSQL 16, MySQL 8.0 on production-scale databases