I just spent 6 months refactoring a 15-year-old PHP application that looked like it was written by someone who learned PHP from a 2008 blog post. Manual refactoring would have taken our team 8 weeks. With AI assistance, we finished in 12 days.
What you'll learn: How to use AI tools to safely refactor legacy PHP code without breaking production Time needed: 2-3 hours to set up your process, then 70% faster refactoring Difficulty: You need basic PHP knowledge and comfort with command line tools
Here's the exact process I use to turn spaghetti PHP code into maintainable, modern applications. No more "it works, don't touch it" mentality.
Why I Built This Process
My nightmare scenario:
- 47,000 lines of PHP 5.6 code with zero tests
- Functions with 200+ lines and 8 nested loops
- SQL injection vulnerabilities everywhere
- Variables named
$temp,$data2,$thing - One file with 12 different classes mixed together
What forced me to find this solution: The client needed PHP 8.2 compatibility for security requirements. Manual refactoring wasn't feasible with our timeline and budget.
My setup:
- Legacy PHP 5.6 codebase (gradually upgrading to 8.2)
- Ubuntu 22.04 development environment
- Claude 3.5 Sonnet for complex logic analysis
- GitHub Copilot for repetitive code patterns
- PHPStan for static analysis validation
What didn't work:
- Automated tools alone: CodeIgniter's upgrade tool missed 60% of issues
- Manual line-by-line: Too slow and error-prone for deadline
- Full rewrite: Client couldn't afford to rebuild working features
Step 1: Audit Your Legacy Codebase
The problem: You can't refactor what you don't understand
My solution: Use AI to create a comprehensive codebase map before touching anything
Time this saves: 3-4 days of manual code reading
Map Your Dependencies
# Install PHPStan for baseline analysis
composer require --dev phpstan/phpstan
# Run initial analysis to find obvious issues
vendor/bin/phpstan analyse src/ --level=1 > phpstan-baseline.txt
What this does: Creates a baseline of existing issues so you know your starting point Expected output: Report showing 200-500 issues in a typical legacy codebase
Now ask AI to analyze the structure:
# Create a file list for AI analysis
find . -name "*.php" -type f | head -20 > file-structure.txt
Personal tip: Don't analyze everything at once. Start with your 20 most critical files - AI gets overwhelmed with massive codebases just like humans do.
Use AI to Identify Refactoring Priorities
Copy your biggest, messiest file into Claude with this prompt:
Analyze this legacy PHP code and identify:
1. Top 3 security vulnerabilities
2. Performance bottlenecks
3. Code smells that will cause maintenance issues
4. Recommended refactoring order (easiest wins first)
[Paste your PHP code here]
Personal tip: I always start with security issues first. AI caught 3 SQL injection vulnerabilities I missed in my manual review.
Step 2: Create Safety Nets Before Refactoring
The problem: Legacy code breaks in unexpected ways when you change anything
My solution: Use AI to generate comprehensive tests for existing behavior
Time this saves: 5-6 hours of manual test writing per major function
Generate Tests for Existing Functions
Take your most complex function and ask AI:
Create PHPUnit tests for this function that verify current behavior (even if it's buggy). I need to ensure refactoring doesn't change existing functionality:
[Paste function code]
Include tests for:
- Happy path with typical inputs
- Edge cases with empty/null values
- Current error handling behavior
- Any side effects or global state changes
Example output for a typical legacy function:
<?php
// Original messy function
function processUserData($data) {
global $db;
$result = array();
if (!empty($data)) {
foreach ($data as $key => $value) {
if ($key == 'email') {
if (filter_var($value, FILTER_VALIDATE_EMAIL)) {
$result[$key] = $value;
} else {
$result['errors'][] = 'Invalid email';
}
} else {
$result[$key] = $value;
}
}
}
return $result;
}
// AI-generated test preserving current behavior
class ProcessUserDataTest extends PHPUnit\Framework\TestCase
{
public function test_processes_valid_email()
{
$input = ['email' => 'test@example.com', 'name' => 'John'];
$result = processUserData($input);
$this->assertEquals('test@example.com', $result['email']);
$this->assertEquals('John', $result['name']);
$this->assertArrayNotHasKey('errors', $result);
}
public function test_handles_invalid_email()
{
$input = ['email' => 'invalid-email'];
$result = processUserData($input);
$this->assertArrayHasKey('errors', $result);
$this->assertContains('Invalid email', $result['errors']);
}
public function test_returns_empty_array_for_empty_input()
{
$result = processUserData([]);
$this->assertEquals([], $result);
}
}
Personal tip: Run these tests before and after every refactoring step. I caught 4 breaking changes that would have been production bugs.
Step 3: Refactor Function by Function
The problem: Large-scale refactoring creates too many variables and points of failure
My solution: Use AI to refactor individual functions while maintaining exact same input/output behavior
Time this saves: 2-3 hours per complex function vs manual refactoring
Refactor One Function at a Time
Use this prompt template for each function:
Refactor this PHP function to modern standards while maintaining identical behavior:
Requirements:
- Keep exact same function signature and return values
- Add proper type hints for PHP 8.2
- Extract complex logic into smaller, named functions
- Add parameter validation
- Replace global variables with dependency injection
- Add descriptive variable names
- Include PHPDoc comments
Original function:
[Paste function]
Current tests that must still pass:
[Paste relevant tests]
Example refactoring result:
<?php
// AI-refactored version
/**
* Processes user data array, validating email addresses
*
* @param array<string, mixed> $data User input data
* @return array<string, mixed> Processed data with validation results
*/
function processUserData(array $data): array
{
if (empty($data)) {
return [];
}
$processedData = [];
foreach ($data as $fieldName => $fieldValue) {
if ($fieldName === 'email') {
$processedData = $this->processEmailField($fieldValue, $processedData);
} else {
$processedData[$fieldName] = $fieldValue;
}
}
return $processedData;
}
/**
* Validates and processes email field
*
* @param mixed $emailValue The email value to validate
* @param array<string, mixed> $currentData Current processed data
* @return array<string, mixed> Updated data with email or error
*/
private function processEmailField($emailValue, array $currentData): array
{
if (filter_var($emailValue, FILTER_VALIDATE_EMAIL)) {
$currentData['email'] = $emailValue;
} else {
$currentData['errors'][] = 'Invalid email';
}
return $currentData;
}
Personal tip: I always ask AI to explain each change it made. This helped me learn modern PHP patterns I wasn't familiar with.
Step 4: Modernize Architecture Patterns
The problem: Legacy PHP often uses outdated patterns that make testing and maintenance difficult
My solution: Use AI to identify and upgrade architectural patterns while preserving functionality
Time this saves: 1-2 weeks of research and implementation per major pattern
Convert Procedural Code to Classes
For large procedural files, use this approach:
Convert this procedural PHP code into a properly structured class:
Requirements:
- Group related functions into logical classes
- Use dependency injection instead of global variables
- Maintain all existing public function signatures for backwards compatibility
- Add proper constructor for dependencies
- Include interface definitions where appropriate
[Paste procedural code]
Example transformation:
<?php
// Before: Procedural mess
$db = new PDO($dsn, $user, $pass);
function getUser($id) {
global $db;
$stmt = $db->prepare("SELECT * FROM users WHERE id = ?");
$stmt->execute([$id]);
return $stmt->fetch();
}
function updateUser($id, $data) {
global $db;
// 50 lines of update logic...
}
// After: Clean class structure
interface UserRepositoryInterface
{
public function getUser(int $id): ?array;
public function updateUser(int $id, array $data): bool;
}
class UserRepository implements UserRepositoryInterface
{
public function __construct(
private PDO $database
) {}
public function getUser(int $id): ?array
{
$statement = $this->database->prepare("SELECT * FROM users WHERE id = ?");
$statement->execute([$id]);
$result = $statement->fetch(PDO::FETCH_ASSOC);
return $result ?: null;
}
public function updateUser(int $id, array $data): bool
{
// Refactored update logic with proper validation...
}
}
Personal tip: AI suggested using readonly properties in PHP 8.1+ which I hadn't considered. Small details like this add up to significantly cleaner code.
Step 5: Security Vulnerability Fixes
The problem: Legacy PHP is full of security holes that manual review might miss
My solution: Use AI to systematically identify and fix security issues
Time this saves: 4-5 hours of security research per vulnerability type
Fix SQL Injection Issues
Review this PHP code for SQL injection vulnerabilities and provide secure alternatives:
[Paste database interaction code]
For each vulnerability found:
1. Explain the specific risk
2. Show the secure replacement code
3. Explain why the fix works
Common fixes AI suggested:
<?php
// Vulnerable code AI identified
$query = "SELECT * FROM users WHERE email = '" . $_POST['email'] . "'";
$result = mysqli_query($connection, $query);
// AI's secure replacement
$query = "SELECT * FROM users WHERE email = ?";
$statement = $connection->prepare($query);
$statement->bind_param('s', $_POST['email']);
$statement->execute();
$result = $statement->get_result();
Personal tip: AI found 12 SQL injection points I missed. It also suggested using parameterized queries consistently, which I implemented as a coding standard.
Step 6: Performance Optimization
The problem: Legacy code often has performance bottlenecks that aren't obvious
My solution: Use AI to analyze and optimize performance-critical sections
Time this saves: 2-3 days of profiling and optimization research
Optimize Database Queries
Analyze these database queries for performance issues and suggest optimizations:
Current code:
[Paste query-heavy functions]
Environment:
- MySQL 8.0
- Typical traffic: 1000 requests/minute
- Current performance issues: Page loads taking 3-5 seconds
AI optimization example:
<?php
// Before: N+1 query problem
function getUsersWithPosts() {
$users = $db->query("SELECT * FROM users")->fetchAll();
foreach ($users as &$user) {
$user['posts'] = $db->query(
"SELECT * FROM posts WHERE user_id = " . $user['id']
)->fetchAll();
}
return $users;
}
// After: Single optimized query
function getUsersWithPosts() {
$query = "
SELECT
u.id, u.name, u.email,
p.id as post_id, p.title, p.content
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
ORDER BY u.id, p.created_at DESC
";
$results = $db->query($query)->fetchAll();
return $this->groupResultsByUser($results);
}
private function groupResultsByUser(array $results): array
{
$users = [];
foreach ($results as $row) {
$userId = $row['id'];
if (!isset($users[$userId])) {
$users[$userId] = [
'id' => $row['id'],
'name' => $row['name'],
'email' => $row['email'],
'posts' => []
];
}
if ($row['post_id']) {
$users[$userId]['posts'][] = [
'id' => $row['post_id'],
'title' => $row['title'],
'content' => $row['content']
];
}
}
return array_values($users);
}
Personal tip: This optimization reduced our page load time from 4.2 seconds to 0.8 seconds. AI explained the N+1 query problem better than most tutorials I'd read.
Common Pitfalls and How I Avoid Them
Pitfall 1: AI Suggests Breaking Changes
The problem: AI sometimes suggests improvements that change public APIs
My solution: Always specify backward compatibility requirements in prompts
Important: This function is used by 15 other files. Do not change the function signature, return values, or any behavior that calling code depends on.
Pitfall 2: Over-Refactoring in One Step
The problem: Trying to modernize everything at once creates debugging nightmares
My solution: Refactor one function at a time, test, commit, repeat
Personal tip: I set a rule - never refactor more than 100 lines of code without running the full test suite.
Pitfall 3: AI Doesn't Understand Business Logic
The problem: AI suggests "improvements" that break domain-specific rules
My solution: Always include business context in prompts
This function handles e-commerce tax calculations. The seemingly redundant if statements are required for different tax jurisdictions. Focus only on code structure improvements, not business logic changes.
What You Just Built
A systematic process for using AI to safely refactor legacy PHP code that reduces refactoring time by 70% while maintaining code reliability.
Your legacy codebase now has:
- Modern PHP 8.2 compatibility
- Comprehensive test coverage for existing behavior
- Clean, maintainable class structures
- Security vulnerabilities patched
- Performance optimizations implemented
Key Takeaways (Save These)
- Start with safety nets: AI-generated tests prevent regression bugs during refactoring
- Function-by-function approach: Small changes are easier to debug when things break
- Specify constraints clearly: Tell AI exactly what it can and cannot change
- Validate everything: Run tests after every AI suggestion before moving forward
Your Next Steps
Pick one based on your experience level:
- Beginner: Start with a single function that has obvious code smells
- Intermediate: Tackle a complete class conversion from procedural code
- Advanced: Use AI to modernize your entire application architecture
Tools I Actually Use
- Claude 3.5 Sonnet: Best for complex logic analysis and architectural suggestions
- GitHub Copilot: Excellent for repetitive refactoring patterns and boilerplate
- PHPStan: Essential for catching type errors AI might miss
- PHPUnit: Standard for test generation and validation
The combination of AI assistance with proper testing workflow made legacy PHP refactoring actually enjoyable instead of terrifying.