Isolate Vocals and Stems from Any Audio File Using AI

Use free and open-source AI tools like Demucs and Spleeter to extract clean vocals, drums, bass, and instruments from any audio file.

Problem: You Need Stems but Don't Have the Originals

You want the isolated vocals, drums, or bass from a track — but you only have the mixed audio file. Maybe you're remixing, sampling, practicing an instrument, or transcribing lyrics. Getting clean stems without the original multitrack session seems impossible.

AI-powered source separation makes it doable in minutes on your own machine.

You'll learn:

  • How modern AI stem separation works (and its limits)
  • How to install and run Demucs — the current state-of-the-art open-source model
  • How to use Spleeter as a faster, lighter alternative
  • How to get the cleanest possible results from either tool

Time: 20 min | Level: Intermediate


Why This Works

Stem separation models are trained on thousands of mixed tracks paired with their original stems. They learn to recognize the frequency patterns, timing signatures, and harmonic profiles of each instrument type — then apply that learned pattern to new audio it's never heard.

The two tools worth knowing:

Demucs (by Meta Research) uses a hybrid transformer architecture trained on large datasets. It produces the cleanest separations available from open-source software, especially for vocals and bass. It's slower but worth it.

Spleeter (by Deezer) is a faster, older model based on spectrogram masking. Lower quality than Demucs but runs quickly on modest hardware — useful for batch processing or quick drafts.

Common limitations to know upfront:

  • Bleed is unavoidable — no model produces perfect separation
  • Reverb and heavily compressed mixes are harder to separate cleanly
  • Results vary significantly by genre; acoustic music separates better than dense electronic mixes

Step 1: Install Demucs

You need Python 3.8+ and pip. A GPU speeds things up significantly, but CPU works fine for most tracks.

# Install Demucs
pip install demucs

# Verify installation
demucs --help

Expected: You should see the Demucs help output listing model options and flags.

If it fails:

  • CUDA error on install: Run pip install demucs --no-deps then pip install torch torchaudio --index-url https://download.pytorch.org/whl/cpu
  • Permission error: Use pip install --user demucs or create a virtual environment first

Step 2: Run Your First Separation

# Basic separation — outputs 4 stems: vocals, drums, bass, other
demucs path/to/your/song.mp3

# Output goes to: ./separated/htdemucs/song-name/

By default, Demucs uses the htdemucs model (hybrid transformer), which is the best general-purpose option. Output files are named vocals.wav, drums.wav, bass.wav, and other.wav.

If you want a specific stem only:

# Extract just vocals (skips processing other stems — faster)
demucs --two-stems=vocals path/to/song.mp3
# Outputs: vocals.wav and no_vocals.wav (the instrumental)

Expected output:

Separated tracks will be stored in /home/user/separated/htdemucs/song-name
Separating track song.mp3

A progress bar runs for 1–5 minutes depending on track length and hardware.


Step 3: Choose the Right Model

The default htdemucs handles most cases well. But Demucs ships with several models:

# Best quality — slower, more VRAM
demucs -n htdemucs_ft path/to/song.mp3

# 6-stem separation: adds piano and guitar tracks
demucs -n htdemucs_6s path/to/song.mp3

# Fastest — use for quick previews or batch jobs
demucs -n mdx_extra path/to/song.mp3

When to use each:

htdemucs is your default. Use htdemucs_ft (fine-tuned) when you need the cleanest possible vocal for a vocal edit or pitch correction workflow. Use htdemucs_6s when you specifically need guitar or piano isolated. Use mdx_extra when you're processing many files and speed matters more than quality.


Step 4: Process a Whole Folder

# Separate every audio file in a directory
for f in ~/music/*.mp3; do demucs "$f"; done

# Or with find for nested folders
find ~/music -name "*.mp3" -exec demucs {} \;

If it fails:

  • Out of memory on long tracks: Add --segment 7 to process in 7-second chunks: demucs --segment 7 song.mp3
  • Unsupported format: Convert to WAV first with ffmpeg -i song.flac song.wav

Alternative: Spleeter (Fast and Lightweight)

Use Spleeter when you need speed over quality — batch jobs, quick previews, or when running on a CPU-only machine with limited RAM.

# Install
pip install spleeter

# 2-stem separation: vocals + accompaniment
spleeter separate -p spleeter:2stems -o output/ song.mp3

# 4-stem: vocals, drums, bass, other
spleeter separate -p spleeter:4stems -o output/ song.mp3

# 5-stem: adds piano
spleeter separate -p spleeter:5stems -o output/ song.mp3

Spleeter downloads its pretrained models on first run (~100–300MB depending on configuration). Output is MP3 by default; add -c wav for lossless output.


Verification

After running either tool, check your output directory:

ls ./separated/htdemucs/your-song-name/
# vocals.wav  drums.wav  bass.wav  other.wav

Load the stems into your DAW or listen directly:

# Quick playback check (requires ffplay from ffmpeg)
ffplay ./separated/htdemucs/song-name/vocals.wav

What good separation sounds like: Vocals should be clear with minimal drum bleed. Some reverb tail from the original mix will bleed through — that's normal. If you hear significant drum hits in the vocal stem on a modern pop track, try htdemucs_ft for better results.


What You Learned

  • Demucs (htdemucs) is the best open-source option for quality separations; Spleeter is faster but lower quality
  • The --two-stems=vocals flag is the most practical option for most use cases — you get vocals and an instrumental in one pass
  • Dense, heavily processed mixes (EDM, metal) produce more bleed than acoustic or lightly mixed recordings
  • Use --segment 7 on Demucs if you run into memory issues on long tracks or limited hardware

When NOT to use this approach: If you need stems for commercial release or sample-accurate stem mixing, AI separation introduces artifacts that will be audible. For anything beyond personal use, remixing practice, or transcription, contact the original rights holder for the actual multitrack session.


Tested on Demucs 4.0.1, Spleeter 2.3.2, Python 3.11, macOS Sequoia & Ubuntu 24.04