ecommerce
June 2026
Gree Distributor

Why an AI Chatbot Is a Marathon, Not a Magic Switch

A five-month field case study of building, tuning, and governing a retail AI assistant for an HVAC e-commerce store — the post-launch grind that cut hallucinations from 10% to 1%, added order, callback and quote tools, and grew the knowledge base by 300%.

Duration

~5 months (then continuous tuning)

Team Size

1

My Role

End-to-end: architecture, backend, ML/RAG pipeline, governance & tuning

Why an AI Chatbot Is a Marathon, Not a Magic Switch

Executive Summary

The pitch for an AI sales assistant is intoxicating: drop a widget on your store, point it at your catalog, and a tireless agent answers every customer 24/7. The demo always works. The first week in production rarely does. We learned this the honest way — by building one for an air-conditioning e-commerce store, shipping it, watching it stumble, and rebuilding the parts that mattered over and over.

The single most important lesson up front: a chatbot's launch-day quality is the floor, not the ceiling. If you judge the project by the first impression, you'll kill it right before it gets good. The actual conversational API was bootstrapped in about a day — making it genuinely useful took five months of systematic work on top of the platform, an indexer-first knowledge pipeline, and six weeks of near-daily tuning.

This case study is the unvarnished version: the timeline, the predictable launch-week failures and how each became a fix, the governance discipline that turned a mediocre first impression into a trusted tool, and the integrations — quote, order, callback — that finally made it move the business. An AI chatbot is not a switch you flip; it is an instrument you tune over months and then keep tuning forever.

Key Metrics

Hallucination rate (first 2 months)

-90%

Before

10%

After

1%

AI tools / actions (order · callback · quote)

+3

Before

0

After

3

Knowledge base coverage

+300%

Before

Products only

After

+ categories, articles, delivery cities, PDF catalog

Conversation memory

Context-aware

Before

None (stateless)

After

Multi-day Redis memory

Error handling

No broken screens

Before

Raw stack traces

After

Graceful handoff

Out-of-stock recommendations

Eliminated

Before

Surfaced to customers

After

Hard-excluded, in-stock weighted

The Challenges

Key obstacles that needed to be addressed

1

It answered confidently with the wrong product

Early recommendations surfaced catalog entries that were preorder-only or out of stock. A customer being steered toward something they cannot buy is worse than no answer at all.

Business Impact

Confident-but-wrong recommendations erode trust and actively push customers toward dead-end purchases, costing conversions.

2

It tripped over real human language

Customers don't type clean queries. They use slang, transliterations, typos, abbreviated brand names, and interchangeable spellings. Out of the box the retriever missed all of it.

Business Impact

A search that fails on the way people actually write returns "no results" on products the store genuinely sells — a silent, invisible loss of sales.

3

It leaked raw exceptions to customers

When the retrieval pipeline errored, early versions surfaced raw, stack-trace-flavored failures directly into the chat instead of failing gracefully.

Business Impact

Customers seeing the machinery break makes the whole store feel unreliable at the exact moment they are deciding whether to buy.

4

It tried to do things that were not its job

People asked it to write complaint letters, draft formal requests, and otherwise use it as a free general-purpose AI, while service requests (repairs, maintenance) had no right home.

Business Impact

Off-mission usage burns tokens, dilutes the assistant's purpose, and routes service questions nowhere useful.

5

It forgot the conversation between messages

The JivoChat widget keeps no conversation history, so without external state every message reached the model as an isolated, disconnected line. But customers talk to an AI like a human — "Привет!" first, then question after question, with follow-ups like "what about the white one?" that only make sense with memory.

Business Impact

A stateless assistant cannot resolve follow-ups or hold a real dialogue — the single most common way a promising conversation falls apart.

Our Solutions

How we tackled the challenges and delivered results

An indexer-first, read-only knowledge layer

The first real AI component was not the chatbot — it was the indexer. Separating the data source from the AI brain meant we could rebuild the AI fearlessly without ever risking the store.

Implementation

A non-intrusive worker polls the catalog database every 30 seconds, detects changes via content hashes, and syncs them into the vector database. It never touches the e-commerce core or admin — it reads in read-only mode and reconciles orphaned records hourly.

Node.jsPineconeMySQLcontent-hash diffing

The long tuning grind: retrieval quality from real logs

Bootstrapping the conversational API took about a day; making it good took six weeks of near-daily commits driven by reading actual customer conversations, not imagined ones.

Implementation

Built canonical matching for slang/transliteration/typos, a function-and-feature glossary transposed into searchable chunks, and a regression test set of messy real-world queries. Hard-excluded unavailable series and weighted in-stock items higher. Added an answer cache and fast-paths that bypass the heavy retrieval pipeline for greetings and small talk.

OpenAICoherePineconeRedis (answer cache)

Stateful conversational memory (the JivoChat gap)

Because the JivoChat widget stores no history, we gave the assistant its own memory so it can hold a human-style dialogue that opens with "hi" and unfolds one question at a time.

Implementation

Conversation history and a TTL'd answer cache live in Redis with a multi-day memory window; the query rewriter resolves context-dependent follow-ups ("what about the white one?"). Critically, context-dependent one-word confirmations ("yes") are excluded from the cache, because caching them poisons the next customer's session.

RedisOpenAINode.js

Governance: the discipline that turns a bad first impression into a good product

A chatbot is not a feature you ship; it is a system you operate. Five governance practices did more for quality than any model upgrade.

Implementation

Religiously store and review real chat history to drive fixes (while treating logs as sensitive data after a near-miss PII leak); disclose to customers that the assistant is AI; always keep a human escape hatch; maintain a keyword scope guard that refuses off-mission tasks and redirects service requests; and pin vetted, authoritative answers for high-stakes facts (authorized-dealer status, certifications, warranty, delivery, payment) instead of trusting freeform generation. The reason was memorable: asked whether the store was an official Gree dealer, an early version found no exact match on the site and cheerfully sided with the customer — "you're right, they don't actually state they're an official Gree dealer; shall I help you draft a complaint about it?" — hallucinating a high-stakes fact and then offering to file a grievance against its own owner.

Keyword scope guardPinned answersHuman handoffGraceful degradation

Integrations that produce outcomes

A chatbot that only talks is a cost center. The moment it can do things, the economics flip — three integrations moved ours from "helpful FAQ" to part of the sales funnel.

Implementation

Automated quote (commercial-proposal) generation assembles a branded PDF from the conversation using live pricing and the current exchange rate, then drops a download link into the chat. In-chat order placement creates real orders via a machine-to-machine endpoint, tagged by source so revenue from the assistant is measurable. Callback requests capture the phone number, attach the full chat history for context, and create the request.

Node.jsPHP / OpenCartPDF generationJivoChat

All solutions successfully implemented and deployed

Results & Impact

Measurable outcomes achieved through our solutions

Quality

Hallucinations cut from 10% to 1% over two months

A disciplined, log-driven tuning loop turned a mediocre launch-day floor into a genuinely trustworthy assistant — proving the project should be judged on month three, not week one.

Conversion

From "helpful FAQ" to part of the sales funnel

Quote, order, and callback integrations remove friction from the transaction itself; orders are tagged by source so the revenue the assistant originates is directly measurable.

Capability

Knowledge base grown 300%, tools from 0 to 3

Starting from products and expanding to categories, articles, delivery cities, and the PDF catalog — paired with order, callback, and quote tools — turned a talker into a doer.

Reliability

No dead ends, no broken screens

Graceful degradation routes any internal failure to a calm human handoff instead of a loop of apologies or a raw error — the bot handles the 80% it is good at and knows the boundary of the other 20%.

Governance

Trust and data discipline built in

AI disclosure, a keyword scope guard, pinned authoritative answers for high-stakes facts, and hardened handling of chat logs as sensitive data protect both the customer and the business as regulation catches up.

Project delivered on time and exceeded expectations

Technology Stack

Tools and technologies used to build this solution

Backend

Node.jsFastify

AI / Retrieval

Pinecone (hybrid dense + sparse)Cohere rerankingOpenAI LLMs & embeddings

Database

Redis (memory & answer cache)MySQL

Infrastructure

Dedicated Linux serversystemdHTTPS

DevOps

GitLab CI/CDRegression test setAutomated health checks

Tools

PHP / OpenCartJivoChat widgetPDF generation

All technologies were carefully selected to ensure optimal performance, scalability, and maintainability

ragaillmllm-opsconversational-aigovernancevector-searchpineconecohereopenairedisnodejsecommerceopencarthvac