Troubleshoot SynthID Watermark Detection in Audio/Video Streams

Fix SynthID watermark detection failures in audio/video pipelines—covering signal loss, false negatives, and codec interference in 20 minutes.

Problem: SynthID Watermark Detection Returns False Negatives

You've embedded SynthID watermarks into generated audio or video, but detection keeps returning WATERMARK_NOT_DETECTED or unreliable scores — even on files you just watermarked yourself.

You'll learn:

  • Why detection fails after encoding/transcoding pipelines
  • How to configure detector thresholds correctly for audio vs. video
  • How to isolate whether the watermark survived the codec

Time: 20 min | Level: Intermediate


Why This Happens

SynthID embeds imperceptible signals into the spectrogram (audio) or pixel latents (video). These signals are robust by design — but several pipeline stages destroy them silently:

  • Lossy compression at low bitrates strips the frequency bands where the watermark lives
  • Sample rate conversion (e.g., 48kHz → 16kHz) aliases the signal out
  • Re-encoding through intermediate formats (WAV → MP3 → WAV) degrades watermark SNR below detection threshold
  • Video frame drops or re-timestamping misaligns the temporal watermark pattern

Common symptoms:

  • Detection score hovers near 0.5 (chance-level) instead of >0.9
  • Works on raw model output, fails after your CDN or transcoder touches the file
  • Inconsistent results — same file sometimes detected, sometimes not

Solution

Step 1: Verify the Watermark Before It Enters Your Pipeline

Test on the raw model output first. If detection fails here, the issue is in your embedding config, not your pipeline.

from synthid_tf import watermarking

# Load the detector with your key config
detector = watermarking.create_detector(
    detector_config=your_config,  # Must match the embedder config exactly
)

# Test on raw output BEFORE any encoding
with open("raw_output.wav", "rb") as f:
    audio_bytes = f.read()

result = detector.detect(audio_bytes)
print(f"Score: {result.score:.4f}")  # Expect >0.9 for a freshly watermarked file
print(f"Decision: {result.decision}")

Expected: Score above 0.9, decision WATERMARK_DETECTED.

If it fails here:

  • Wrong key config: The detector config must use the exact same private_key and watermark_depth as your embedder. Mismatched depth is the most common cause.
  • Audio too short: SynthID audio requires at least ~1 second of signal. Clips under 0.8s give unreliable scores.

Step 2: Check Bitrate and Sample Rate Through Your Codec

Run your file through your actual pipeline, then test detection at each stage.

import subprocess
import tempfile
import os

def test_detection_after_transcode(input_path: str, codec_args: list[str]) -> float:
    """Transcode audio and return watermark score."""
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp:
        output_path = tmp.name

    subprocess.run(
        ["ffmpeg", "-i", input_path, *codec_args, output_path, "-y"],
        check=True,
        capture_output=True,
    )

    with open(output_path, "rb") as f:
        audio_bytes = f.read()

    os.unlink(output_path)
    result = detector.detect(audio_bytes)
    return result.score

# Test at different MP3 bitrates — SynthID degrades significantly below 128kbps
for bitrate in ["320k", "192k", "128k", "96k", "64k"]:
    score = test_detection_after_transcode(
        "raw_output.wav",
        ["-codec:a", "libmp3lame", "-b:a", bitrate]
    )
    print(f"MP3 {bitrate}: score={score:.4f}")

Expected output:

MP3 320k: score=0.9312
MP3 192k: score=0.9101
MP3 128k: score=0.8743
MP3 96k:  score=0.6221   # <-- degrading
MP3 64k:  score=0.5108   # <-- effectively destroyed

If scores drop below 0.75: Your pipeline is lossy enough to destroy the watermark. You have two options:

  1. Increase minimum bitrate to 128kbps+ for MP3 (or use AAC at 96kbps which preserves watermarks better)
  2. Re-watermark after final encoding (see Step 3)

Step 3: Re-Watermark After Final Encoding (If You Can't Change Bitrate)

If your delivery format requires aggressive compression, embed the watermark into the already-encoded file instead of the raw model output.

from synthid_tf import watermarking
import soundfile as sf
import numpy as np

# Load the post-encoded file
audio_data, sample_rate = sf.read("encoded_output.wav")

# Embedder must operate at the file's actual sample rate
embedder = watermarking.create_embedder(
    embedder_config=your_config,
    sample_rate=sample_rate,  # Match the file, don't assume 24kHz
)

# Embed and save
watermarked = embedder.embed(audio_data)
sf.write("final_watermarked.wav", watermarked, sample_rate)

# Immediately verify
with open("final_watermarked.wav", "rb") as f:
    result = detector.detect(f.read())

assert result.score > 0.85, f"Watermark weak after re-embedding: {result.score}"

Why this works: The watermark now lives in the post-compression frequency space, so the delivery codec doesn't strip it again.

If it fails:

  • AssertionError with score ~0.5: Your sample rate config is wrong. Print sample_rate and confirm it matches what sf.read() returned — don't hardcode 24000.
  • ValueError: audio too short: SynthID needs at least ~1 second. Pad short clips with silence before embedding, then trim after.

Step 4: Video-Specific — Check Frame Rate and Resolution Changes

For video streams, watermark detection is sensitive to resolution rescaling and frame rate changes. The temporal pattern in the video watermark assumes a consistent frame sequence.

import cv2
from synthid_tf import video_watermarking

def check_video_watermark(video_path: str) -> dict:
    cap = cv2.VideoCapture(video_path)
    fps = cap.get(cv2.CAP_PROP_FPS)
    width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
    height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    cap.release()

    video_detector = video_watermarking.create_detector(
        detector_config=your_video_config,
        expected_fps=fps,  # Must pass actual FPS, not assumed value
    )

    with open(video_path, "rb") as f:
        result = video_detector.detect(f.read())

    return {
        "score": result.score,
        "fps": fps,
        "resolution": f"{width}x{height}",
        "decision": result.decision,
    }

# Test before and after your transcoder
print("Original:", check_video_watermark("original.mp4"))
print("Transcoded:", check_video_watermark("transcoded.mp4"))

If score drops after transcoding:

  • Resolution change: Downscaling from 1080p to 720p often survives, but 1080p → 360p typically destroys the watermark.
  • FPS mismatch: If your transcoder drops frames (e.g., 30fps → 24fps by dropping every 5th frame), pass expected_fps=24 to the detector to match. Don't use the original clip's FPS.
  • Rotation or flip: Any geometric transform destroys the spatial watermark. Detect before applying transforms.

Verification

Run a full end-to-end test across your pipeline:

python verify_watermark_pipeline.py \
  --input raw_output.wav \
  --pipeline-config pipeline.yaml \
  --threshold 0.85

Or manually:

stages = {
    "raw": "raw_output.wav",
    "encoded": "encoded_output.mp3",
    "cdn_cached": "cdn_output.mp3",
}

for stage, path in stages.items():
    with open(path, "rb") as f:
        score = detector.detect(f.read()).score
    status = "✓" if score > 0.85 else "✗ FAIL"
    print(f"{status} {stage}: {score:.4f}")

You should see: All stages passing at >0.85. Anything below 0.75 is unreliable for production use.


What You Learned

  • SynthID watermarks are robust but not indestructible — lossy codecs below 128kbps MP3 reliably destroy them
  • Always test detection on raw model output first to isolate whether the problem is embedding or pipeline
  • Re-watermarking after final encoding is a valid strategy when you can't control delivery bitrate
  • Video watermarks are sensitive to frame drops and resolution changes — always pass actual FPS to the detector

Limitation: SynthID detection is probabilistic. A score of 0.85 is not a guarantee — it's a confidence level. For legal or compliance use cases, store the original pre-encode hash alongside the watermark score.

When NOT to use this approach: If your pipeline requires >4x downscaling or conversion through a format with no frequency bands above 8kHz (e.g., telephone-quality audio), the watermark will not survive and re-embedding is the only option.


Tested with SynthID TF 0.1.x, FFmpeg 6.x, Python 3.12, Ubuntu 24.04