Problem: Your AI Images Don't Look Like Your Brand
You're generating product images with Stable Diffusion 3.5, but every output looks generic. Your brand has a specific color palette, logo style, and visual language — and stock prompts can't capture it.
A LoRA (Low-Rank Adaptation) lets you fine-tune SD 3.5 on your own assets without retraining the full model. After training, a single trigger word makes the model generate images that actually match your brand.
You'll learn:
- How to prepare a dataset from your existing brand assets
- How to train a LoRA on SD 3.5 using
sd-scripts - How to use your trained LoRA in ComfyUI or A1111
Time: 45 min | Level: Intermediate
Why This Happens
SD 3.5 is trained on billions of generic internet images. It has no concept of your specific logo, color grading style, or product aesthetic. LoRA training adds a small adapter layer (3–30MB) that shifts the model's outputs toward your visual style without overwriting its existing knowledge.
Common symptoms without a LoRA:
- Generated images ignore your brand colors even when specified in prompts
- Logo elements appear corrupted or blended with other styles
- Product shots look like stock photography, not your brand
Solution
Step 1: Prepare Your Dataset
You need 15–30 high-quality images. Quality beats quantity here.
Requirements:
- Resolution: Minimum 1024×1024px (SD 3.5's native resolution)
- Format: PNG or JPG
- Content: Consistent subject across images (your product, logo lockups, brand scenes)
- Variety: Different angles, lighting, contexts — but always your brand aesthetic
Create your dataset folder:
mkdir -p ~/lora-training/dataset/brand_v1
# Copy your images here
cp /path/to/brand/assets/*.png ~/lora-training/dataset/brand_v1/
Generate captions for each image. This is what teaches the model what it's learning:
pip install torch transformers Pillow --break-system-packages
python3 << 'EOF'
import os
from pathlib import Path
dataset_dir = Path("~/lora-training/dataset/brand_v1").expanduser()
trigger_word = "MYBRAND" # Replace with your unique trigger
for img_path in dataset_dir.glob("*.png"):
caption_path = img_path.with_suffix(".txt")
# Write a consistent caption — describe what's in the image + trigger word
caption = f"{trigger_word} brand photography, [describe: product/logo/scene], clean background, professional lighting"
caption_path.write_text(caption)
print(f"Created: {caption_path.name}")
EOF
Expected: One .txt file per image in your dataset folder.
Each image needs a matching .txt caption file — the model learns from both together
If it fails:
- Permission error: Run
chmod -R 755 ~/lora-training/ - Images skipped: Ensure filenames have no spaces — rename with
rename 's/ /_/g' *.png
Step 2: Install sd-scripts
sd-scripts by kohya-ss is the standard tool for LoRA training on SD models.
cd ~
git clone https://github.com/kohya-ss/sd-scripts
cd sd-scripts
pip install -r requirements.txt --break-system-packages
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 --break-system-packages
Download the SD 3.5 base model checkpoint (you need the full model, not just the VAE):
mkdir -p ~/lora-training/models
# Download from Hugging Face — requires account and license acceptance
huggingface-cli download stabilityai/stable-diffusion-3.5-large \
--local-dir ~/lora-training/models/sd35 \
--include "*.safetensors"
Expected: ~/lora-training/models/sd35/sd3.5_large.safetensors (~8GB)
If it fails:
- Auth error: Run
huggingface-cli loginfirst and accept the model license at huggingface.co/stabilityai/stable-diffusion-3.5-large - Disk space: You need ~12GB free (model + training cache)
Step 3: Configure and Run Training
Create the training config:
cat > ~/lora-training/train_brand_lora.toml << 'EOF'
[general]
enable_bucket = true
shuffle_caption = true
caption_extension = ".txt"
[datasets]
[[datasets.subsets]]
image_dir = "~/lora-training/dataset/brand_v1"
num_repeats = 10 # Repeat dataset 10x — for small datasets (<20 images)
EOF
Run training:
cd ~/sd-scripts
accelerate launch sd3_train_network.py \
--pretrained_model_name_or_path="~/lora-training/models/sd35/sd3.5_large.safetensors" \
--dataset_config="~/lora-training/train_brand_lora.toml" \
--output_dir="~/lora-training/output" \
--output_name="brand_v1" \
--save_model_as="safetensors" \
--network_module="networks.lora_sd3" \
--network_dim=16 \
--network_alpha=8 \
--learning_rate=1e-4 \
--max_train_epochs=20 \
--train_batch_size=1 \
--mixed_precision="bf16" \
--gradient_checkpointing \
--optimizer_type="AdamW8bit"
What these settings do:
network_dim=16— LoRA rank. Higher = more expressive but larger file. 16 is solid for brand stylesnetwork_alpha=8— Keep this at half ofnetwork_dimfor stable trainingmax_train_epochs=20— For 20 images × 10 repeats, this is ~4000 training steps
Loss should decrease steadily — if it spikes and stays high after epoch 5, your learning rate is too aggressive
Training takes 20–40 minutes on an RTX 3090/4080. On an A100, expect ~10 minutes.
If it fails:
- CUDA OOM: Add
--gradient_checkpointingand reduce--train_batch_size=1(it's already 1 — if still OOM, the model is too large for your GPU VRAM; use a cloud GPU) ModuleNotFoundError: networks.lora_sd3: You need sd-scripts from after November 2024 — rungit pullin the sd-scripts directory
Step 4: Load Your LoRA in ComfyUI
Copy the output to your ComfyUI LoRA directory:
cp ~/lora-training/output/brand_v1.safetensors \
~/ComfyUI/models/loras/brand_v1.safetensors
In ComfyUI, add a Load LoRA node between your CLIP and KSampler. Set strength to 0.7–0.85 to start — lower values blend with the base model, higher values push harder toward your brand style.
The LoRA node sits between your model loader and sampler — always use your trigger word in the prompt
Test prompt structure:
MYBRAND brand photography, white ceramic coffee mug on marble surface,
soft natural lighting, product shot, high quality
If outputs look off:
- Too generic: Increase LoRA strength to
0.9 - Distorted or over-stylized: Drop strength to
0.6or retrain with fewer epochs (try 10–12) - Trigger word not working: Check your caption files — search for a typo in the trigger word
Verification
Run a batch of test generations with and without your LoRA active:
# Quick CLI test with diffusers (optional)
python3 << 'EOF'
from diffusers import StableDiffusion3Pipeline
import torch
pipe = StableDiffusion3Pipeline.from_pretrained(
"~/lora-training/models/sd35",
torch_dtype=torch.bfloat16
)
pipe.load_lora_weights("~/lora-training/output/brand_v1.safetensors")
pipe = pipe.to("cuda")
image = pipe(
"MYBRAND brand photography, product on white surface, professional",
num_inference_steps=28,
guidance_scale=4.5
).images[0]
image.save("test_output.png")
print("Saved test_output.png")
EOF
You should see: Visual consistency with your brand assets — recognizable color grading, style, and aesthetic.
Left: base SD 3.5 output. Right: LoRA-guided output matching brand color and style
What You Learned
- LoRA adapters are small (~10–30MB) but meaningfully shift model outputs toward your training data
- Caption quality matters more than image count — vague captions produce vague results
network_dim=16with 15–30 images is a reliable starting point; scalenum_repeatsdown as your dataset grows- LoRA strength is a dial, not a switch —
0.7–0.85balances brand fidelity with prompt flexibility
Limitations:
- SD 3.5 LoRAs are not cross-compatible with SD 1.5 or SDXL — retrain for each base model
- Training on fewer than 10 images causes overfitting; outputs become nearly identical regardless of prompt
- This approach captures style, not exact logo reproduction — for precise logo placement, use ControlNet in addition to your LoRA
Tested on Stable Diffusion 3.5 Large, sd-scripts (January 2026), Python 3.12, CUDA 12.1, RTX 4080 16GB