Train a Custom SD 3.5 LoRA for Brand Assets in 45 Minutes

Step-by-step guide to training a Stable Diffusion 3.5 LoRA on your brand assets for consistent, on-brand AI image generation.

Problem: Your AI Images Don't Look Like Your Brand

You're generating product images with Stable Diffusion 3.5, but every output looks generic. Your brand has a specific color palette, logo style, and visual language — and stock prompts can't capture it.

A LoRA (Low-Rank Adaptation) lets you fine-tune SD 3.5 on your own assets without retraining the full model. After training, a single trigger word makes the model generate images that actually match your brand.

You'll learn:

  • How to prepare a dataset from your existing brand assets
  • How to train a LoRA on SD 3.5 using sd-scripts
  • How to use your trained LoRA in ComfyUI or A1111

Time: 45 min | Level: Intermediate


Why This Happens

SD 3.5 is trained on billions of generic internet images. It has no concept of your specific logo, color grading style, or product aesthetic. LoRA training adds a small adapter layer (3–30MB) that shifts the model's outputs toward your visual style without overwriting its existing knowledge.

Common symptoms without a LoRA:

  • Generated images ignore your brand colors even when specified in prompts
  • Logo elements appear corrupted or blended with other styles
  • Product shots look like stock photography, not your brand

Solution

Step 1: Prepare Your Dataset

You need 15–30 high-quality images. Quality beats quantity here.

Requirements:

  • Resolution: Minimum 1024×1024px (SD 3.5's native resolution)
  • Format: PNG or JPG
  • Content: Consistent subject across images (your product, logo lockups, brand scenes)
  • Variety: Different angles, lighting, contexts — but always your brand aesthetic

Create your dataset folder:

mkdir -p ~/lora-training/dataset/brand_v1
# Copy your images here
cp /path/to/brand/assets/*.png ~/lora-training/dataset/brand_v1/

Generate captions for each image. This is what teaches the model what it's learning:

pip install torch transformers Pillow --break-system-packages

python3 << 'EOF'
import os
from pathlib import Path

dataset_dir = Path("~/lora-training/dataset/brand_v1").expanduser()
trigger_word = "MYBRAND"  # Replace with your unique trigger

for img_path in dataset_dir.glob("*.png"):
    caption_path = img_path.with_suffix(".txt")
    # Write a consistent caption — describe what's in the image + trigger word
    caption = f"{trigger_word} brand photography, [describe: product/logo/scene], clean background, professional lighting"
    caption_path.write_text(caption)
    print(f"Created: {caption_path.name}")
EOF

Expected: One .txt file per image in your dataset folder.

Dataset folder structure showing paired image and caption files Each image needs a matching .txt caption file — the model learns from both together

If it fails:

  • Permission error: Run chmod -R 755 ~/lora-training/
  • Images skipped: Ensure filenames have no spaces — rename with rename 's/ /_/g' *.png

Step 2: Install sd-scripts

sd-scripts by kohya-ss is the standard tool for LoRA training on SD models.

cd ~
git clone https://github.com/kohya-ss/sd-scripts
cd sd-scripts
pip install -r requirements.txt --break-system-packages
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 --break-system-packages

Download the SD 3.5 base model checkpoint (you need the full model, not just the VAE):

mkdir -p ~/lora-training/models
# Download from Hugging Face — requires account and license acceptance
huggingface-cli download stabilityai/stable-diffusion-3.5-large \
  --local-dir ~/lora-training/models/sd35 \
  --include "*.safetensors"

Expected: ~/lora-training/models/sd35/sd3.5_large.safetensors (~8GB)

If it fails:

  • Auth error: Run huggingface-cli login first and accept the model license at huggingface.co/stabilityai/stable-diffusion-3.5-large
  • Disk space: You need ~12GB free (model + training cache)

Step 3: Configure and Run Training

Create the training config:

cat > ~/lora-training/train_brand_lora.toml << 'EOF'
[general]
enable_bucket = true
shuffle_caption = true
caption_extension = ".txt"

[datasets]
  [[datasets.subsets]]
  image_dir = "~/lora-training/dataset/brand_v1"
  num_repeats = 10  # Repeat dataset 10x — for small datasets (<20 images)
EOF

Run training:

cd ~/sd-scripts

accelerate launch sd3_train_network.py \
  --pretrained_model_name_or_path="~/lora-training/models/sd35/sd3.5_large.safetensors" \
  --dataset_config="~/lora-training/train_brand_lora.toml" \
  --output_dir="~/lora-training/output" \
  --output_name="brand_v1" \
  --save_model_as="safetensors" \
  --network_module="networks.lora_sd3" \
  --network_dim=16 \
  --network_alpha=8 \
  --learning_rate=1e-4 \
  --max_train_epochs=20 \
  --train_batch_size=1 \
  --mixed_precision="bf16" \
  --gradient_checkpointing \
  --optimizer_type="AdamW8bit"

What these settings do:

  • network_dim=16 — LoRA rank. Higher = more expressive but larger file. 16 is solid for brand styles
  • network_alpha=8 — Keep this at half of network_dim for stable training
  • max_train_epochs=20 — For 20 images × 10 repeats, this is ~4000 training steps

Training progress showing loss curve in terminal Loss should decrease steadily — if it spikes and stays high after epoch 5, your learning rate is too aggressive

Training takes 20–40 minutes on an RTX 3090/4080. On an A100, expect ~10 minutes.

If it fails:

  • CUDA OOM: Add --gradient_checkpointing and reduce --train_batch_size=1 (it's already 1 — if still OOM, the model is too large for your GPU VRAM; use a cloud GPU)
  • ModuleNotFoundError: networks.lora_sd3: You need sd-scripts from after November 2024 — run git pull in the sd-scripts directory

Step 4: Load Your LoRA in ComfyUI

Copy the output to your ComfyUI LoRA directory:

cp ~/lora-training/output/brand_v1.safetensors \
  ~/ComfyUI/models/loras/brand_v1.safetensors

In ComfyUI, add a Load LoRA node between your CLIP and KSampler. Set strength to 0.7–0.85 to start — lower values blend with the base model, higher values push harder toward your brand style.

ComfyUI workflow showing LoRA node connected between CLIP Text Encode and KSampler The LoRA node sits between your model loader and sampler — always use your trigger word in the prompt

Test prompt structure:

MYBRAND brand photography, white ceramic coffee mug on marble surface, 
soft natural lighting, product shot, high quality

If outputs look off:

  • Too generic: Increase LoRA strength to 0.9
  • Distorted or over-stylized: Drop strength to 0.6 or retrain with fewer epochs (try 10–12)
  • Trigger word not working: Check your caption files — search for a typo in the trigger word

Verification

Run a batch of test generations with and without your LoRA active:

# Quick CLI test with diffusers (optional)
python3 << 'EOF'
from diffusers import StableDiffusion3Pipeline
import torch

pipe = StableDiffusion3Pipeline.from_pretrained(
    "~/lora-training/models/sd35",
    torch_dtype=torch.bfloat16
)
pipe.load_lora_weights("~/lora-training/output/brand_v1.safetensors")
pipe = pipe.to("cuda")

image = pipe(
    "MYBRAND brand photography, product on white surface, professional",
    num_inference_steps=28,
    guidance_scale=4.5
).images[0]

image.save("test_output.png")
print("Saved test_output.png")
EOF

You should see: Visual consistency with your brand assets — recognizable color grading, style, and aesthetic.

Side-by-side comparison of outputs before and after LoRA training Left: base SD 3.5 output. Right: LoRA-guided output matching brand color and style


What You Learned

  • LoRA adapters are small (~10–30MB) but meaningfully shift model outputs toward your training data
  • Caption quality matters more than image count — vague captions produce vague results
  • network_dim=16 with 15–30 images is a reliable starting point; scale num_repeats down as your dataset grows
  • LoRA strength is a dial, not a switch — 0.7–0.85 balances brand fidelity with prompt flexibility

Limitations:

  • SD 3.5 LoRAs are not cross-compatible with SD 1.5 or SDXL — retrain for each base model
  • Training on fewer than 10 images causes overfitting; outputs become nearly identical regardless of prompt
  • This approach captures style, not exact logo reproduction — for precise logo placement, use ControlNet in addition to your LoRA

Tested on Stable Diffusion 3.5 Large, sd-scripts (January 2026), Python 3.12, CUDA 12.1, RTX 4080 16GB