Hybrid Search (BM25 + Dense Vectors) for eCommerce in 20 Minutes

Problem: Your eCommerce Search Misses Obvious Products

A customer searches "running shoes for bad knees." BM25 returns nothing — no product has that exact phrase. Dense vector search alone returns sandals and yoga mats. Neither alone is enough.

You'll learn:

Why BM25 and dense vectors fail independently
How to combine them with Reciprocal Rank Fusion (RRF)
How to deploy this with Elasticsearch 8.x or Qdrant

Time: 20 min | Level: Intermediate

Why This Happens

BM25 is a keyword frequency algorithm. It's great at exact matches ("Nike Air Max 270") but blind to meaning. Dense vector search encodes semantic similarity — it knows "bad knees" relates to "joint support" — but it can hallucinate relevance for unrelated terms.

Common symptoms:

Zero results for descriptive queries ("cozy winter jacket for toddlers")
Brand searches return semantic lookalikes instead of the exact product
Recall is high but precision collapses at the top 5 results

Neither model alone handles the full spectrum of shopper intent. Hybrid search solves this by running both in parallel and merging their ranked results.

Solution

Step 1: Set Up Your Index

This example uses Elasticsearch 8.x with knn dense field support. You'll need Python 3.11+ and elasticsearch-py.

pip install elasticsearch sentence-transformers

Create the index with both a keyword field and a dense vector field:

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

index_config = {
    "mappings": {
        "properties": {
            "title": {"type": "text"},          # BM25 lives here
            "description": {"type": "text"},
            "embedding": {
                "type": "dense_vector",
                "dims": 384,
                "index": True,
                "similarity": "cosine"           # Cosine works best for sentence embeddings
            }
        }
    }
}

es.indices.create(index="products", body=index_config, ignore=400)

Expected: {'acknowledged': True, 'index': 'products'}

If it fails:

400 error: Index already exists — safe to ignore with ignore=400
ConnectionError: Elasticsearch isn't running — docker run -p 9200:9200 elasticsearch:8.12.0

Step 2: Index Products with Embeddings

Use a lightweight sentence transformer to generate embeddings at index time. all-MiniLM-L6-v2 is 80MB and fast enough for batch indexing.

from sentence_transformers import SentenceTransformer
from elasticsearch.helpers import bulk

model = SentenceTransformer("all-MiniLM-L6-v2")

def index_products(products: list[dict]):
    # Encode all descriptions in one batch — much faster than one-by-one
    texts = [f"{p['title']} {p['description']}" for p in products]
    embeddings = model.encode(texts, batch_size=64, show_progress_bar=True)

    actions = [
        {
            "_index": "products",
            "_id": p["id"],
            "_source": {
                "title": p["title"],
                "description": p["description"],
                "embedding": emb.tolist()        # ES requires plain list, not numpy array
            }
        }
        for p, emb in zip(products, embeddings)
    ]

    success, errors = bulk(es, actions)
    return success, errors

Expected: Returns (num_indexed, []) with no errors.

Step 3: Run Hybrid Search with RRF

Reciprocal Rank Fusion merges two ranked lists without needing score normalization. Each result's final score is sum(1 / (k + rank)) across both lists, where k=60 is a smoothing constant.

def hybrid_search(query: str, top_k: int = 10) -> list[dict]:
    query_embedding = model.encode(query).tolist()

    # BM25 leg: keyword match across title + description
    bm25_query = {
        "query": {
            "multi_match": {
                "query": query,
                "fields": ["title^2", "description"],  # Boost title matches
                "type": "best_fields"
            }
        },
        "size": top_k * 2   # Fetch more candidates than needed — RRF will trim
    }

    # KNN leg: semantic nearest neighbors
    knn_query = {
        "knn": {
            "field": "embedding",
            "query_vector": query_embedding,
            "k": top_k * 2,
            "num_candidates": 100   # Higher = better recall, slower speed
        },
        "size": top_k * 2
    }

    bm25_results = es.search(index="products", body=bm25_query)["hits"]["hits"]
    knn_results  = es.search(index="products", body=knn_query)["hits"]["hits"]

    return reciprocal_rank_fusion(bm25_results, knn_results, top_k)


def reciprocal_rank_fusion(
    list_a: list, list_b: list, top_k: int, k: int = 60
) -> list[dict]:
    scores = {}
    sources = {}

    for rank, hit in enumerate(list_a):
        doc_id = hit["_id"]
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
        sources[doc_id] = hit["_source"]

    for rank, hit in enumerate(list_b):
        doc_id = hit["_id"]
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank + 1)
        sources[doc_id] = hit["_source"]

    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [{"id": doc_id, "score": score, **sources[doc_id]}
            for doc_id, score in ranked[:top_k]]

Why k=60? The original RRF paper recommends 60 as a safe default. Lower values amplify top-rank differences; higher values flatten them. Tune it if you have labeled relevance data.

Step 4: Tune the Weight Balance (Optional)

RRF treats both lists equally. If your store is brand-heavy (customers often search exact product names), give BM25 more influence:

def weighted_rrf(list_a, list_b, weight_a=1.5, weight_b=1.0, top_k=10, k=60):
    scores = {}
    sources = {}

    for rank, hit in enumerate(list_a):
        doc_id = hit["_id"]
        scores[doc_id] = scores.get(doc_id, 0) + weight_a / (k + rank + 1)
        sources[doc_id] = hit["_source"]

    for rank, hit in enumerate(list_b):
        doc_id = hit["_id"]
        scores[doc_id] = scores.get(doc_id, 0) + weight_b / (k + rank + 1)
        sources[doc_id] = hit["_source"]

    ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
    return [{"id": doc_id, "score": score, **sources[doc_id]}
            for doc_id, score in ranked[:top_k]]

Rule of thumb: Start at weight_a=1.0, weight_b=1.0. Increase BM25 weight if customers mostly type SKUs or brand names. Increase vector weight if your catalog uses clinical or technical language customers don't know.

Verification

Run a quick benchmark against your existing search:

python -c "
from your_module import hybrid_search
results = hybrid_search('running shoes for bad knees', top_k=5)
for r in results:
    print(r['score']:.4f, r['title'])
"

You should see: Products with joint-support or stability features appearing in top 3, even if they don't contain the phrase "bad knees." Pure BM25 on the same query should return 0 results.

Hybrid search surfaces "ASICS Gel-Kayano 31" and "Brooks Adrenaline GTS 24" where BM25 returns nothing

What You Learned

BM25 handles exact-match intent; dense vectors handle semantic intent — neither alone is sufficient
RRF is calibration-free: it merges ranked lists without normalizing incompatible scores
k=60 is a safe default; tune weights only when you have relevance judgment data
Embedding at query time adds ~5ms latency — use a GPU or ONNX-quantized model in production

Limitation: This approach doubles your search latency in the worst case (two sequential queries). Use Elasticsearch's native hybrid query in 8.14+ to run both legs in a single request.

When NOT to use this: Purely faceted searches (filtering by color, size, price) don't benefit from semantic similarity. Hybrid search helps discovery, not filtering.

Tested on Elasticsearch 8.12, sentence-transformers 3.0, Python 3.12, Ubuntu 22.04