Building an Ecommerce Chatbot Prototype for Jem&fix

2026-05-12 — Tommy Jepsen

I built a working ecommerce chatbot prototype for Jem&fix, a Danish hardware store chain, to explore how good an AI shopping assistant could actually feel when grounded in real product data, real prices, and real store-level inventory.

The key idea is that this isn't just a search box. It's an assistant that does two jobs at once: it helps the customer with the instructions — how to actually build or fix the thing they're describing — and it finds the products they need to do it.

The short version: a customer can describe a project in plain Danish, like "I want to hang a baby swing in a beam". The assistant explains how to approach it, then recommends the right hook, tells them the price, how many are in stock at their store, and exactly which aisle and shelf to find it on. Instructions and products, in one answer.

Here is how it works under the hood.

The Stack

Data source: Jem&fix's own public product API (scraped per category)
Database: Postgres with the pgvector extension
Embeddings: Local open-source embedding model via @huggingface/transformers (1024 dimensions)
Query understanding & answer generation: OpenAI GPT
API layer: Express + Drizzle ORM
Conversation memory: A small thread/messages table for multi-turn context

No external vector DB, no LangChain, no agent framework. Just Postgres, a local embedding model, and a few OpenAI calls glued together.

1. Pulling the Catalogue

The first job was getting the products. Jem&fix has a JSON API behind their category pages, so I walked their sitemap to collect category numbers and then paged through ProductsByCategoryAndFilters to dump every product into Postgres.

For each product I store:

Title, description, URL, image
Price (including campaign/discount fields)
Store placement: section, bookcase, shelf and stock count per store

Store placement is the part that makes this feel less like a chatbot and more like a real assistant. When the bot tells you "Aisle 7, shelf 2, 14 in stock", that comes straight from the same data the staff use.

2. Embedding the Products Locally

For each product I generate a 1024-dim embedding from a concatenation of title, description and price, using an open-source embedding model running locally through @huggingface/transformers:

const extractor = await pipeline("feature-extraction", MODEL_PATH);
const output = await extractor(text, { pooling: "mean" });
const embedding = output.tolist()[0];

await pgPool.query(
  'UPDATE "product" SET embedding = $1 WHERE id = $2',
  [pgvector.toSql(embedding), product.id]
);

Running the embedding model locally means zero per-product cost and no rate limits while indexing a few thousand SKUs. I batched it 50 products at a time with a connection pool, and the whole catalogue trains in one pass.

3. Understanding the Customer's Question

The naïve approach (embed the user's question, do a vector search, done) falls apart fast. People don't ask "armeringsmørtel". They ask "what do I use to fill cracks in concrete?" or "I want to hang a baby swing".

So before any retrieval happens, I ask OpenAI GPT to extract candidate product names from the message:

Customer: "I need some screws for a sandbox"
→ "Galvanised wood screws, Wood screws, Screws"

Customer: "I need a screw to hang a baby swing"
→ "Screw hook, Swing hook, Baby swing hook"

Customer: "What's the weather like?"
→ "ingen"   (none)

ingen ("none") is the escape hatch. If the message isn't shopping intent, I skip retrieval entirely.

4. Hybrid Search: Vectors + Trigrams

With candidate product names in hand, retrieval runs in two layers:

Vector search for semantic matches:

SELECT id, title, price, url,
       1 - (embedding <=> $1) AS similarity
FROM product
ORDER BY embedding <=> $1
LIMIT $2;

I filter to similarity > 0.8 to avoid garbage matches.

Trigram search as a fallback for short or brand-name-y queries. Vector search is great at concepts but terrible at "M8 bolt" or a specific SKU prefix:

SELECT *, similarity(title, $1) AS similarity
FROM product
WHERE title % $1
ORDER BY similarity DESC
LIMIT $2;

I split the LLM-suggested product names on commas and run each one through both. Then I dedupe by product id and join in the store placement row.

5. Letting the LLM Filter

Even with both layers, retrieval is noisy. So I do an evaluation pass: send the candidate titles back to OpenAI GPT along with the original question and ask it which ones are actually relevant. Anything it doesn't list gets dropped.

This is the cheapest reliable way I have found to clean up retrieval noise. The model sees titles only (not embeddings, not scores) and decides what a human shopper would consider on-topic.

6. Writing the Answer

The surviving products get formatted into a structured block with price, stock, aisle and shelf, then handed to OpenAI GPT with a Danish system prompt:

Du er en ekspert butiksassistent for jem&fix byggemarked. Hjælp kunder med at finde de rigtige produkter. Nævn specifikt pris, lager og placering når relevant.

Temperature is set low (0.3) because customers don't want a creative assistant, they want a correct one.

7. Threads & Conversation Memory

A ThreadManager class persists every user message and assistant response in Postgres. On each request I pull the last 5 messages and prepend them to the OpenAI call. That gives multi-turn behaviour without any frontend state. You can ask "do you have it in green?" as a follow-up and it works.

What the Assistant Can Actually Do

Give instructions and find the products to match. This is the core of it. "How do I patch a hole in drywall?" becomes a short how-to plus the filler, sanding block and primer to buy — the guidance and the shopping list in one answer
Answer natural-language product questions in Danish, semantically grounded in the real catalogue
Surface live prices and discount/campaign info
Tell the customer exactly where the product sits in the physical store (section, shelf, stock count)
Maintain context across a multi-turn conversation
Recognise non-shopping messages and respond conversationally without forcing retrieval
Serve a separate /bestoffers endpoint that ranks the top campaign savings across the catalogue

Key Learnings

A few things stood out building this:

Open source is very capable. A local open-source embedding model running through @huggingface/transformers, paired with Postgres + pgvector, carried the entire retrieval layer — at zero per-product cost and with no rate limits while indexing the catalogue. No managed vector DB required.
A pipeline like this is straightforward to build. Postgres + pgvector + a small local embedding model + a few well-placed LLM calls is the whole thing. No LangChain, no orchestration framework, no agent loop — just a pipeline you can read top to bottom in a single file.
Evaluations are the tuning knob. The lever for output quality isn't more infrastructure, it's the eval passes: the LLM filter over candidate titles is what turns noisy retrieval into clean, on-topic results. When the answers need adjusting, you tighten the evals (and dials like the similarity > 0.8 threshold and the low temperature), not the stack.