Why an AI Chatbot Is a Marathon, Not a Magic Switch
A five-month field case study of building, tuning, and governing a retail AI assistant for an HVAC e-commerce store — the post-launch grind that cut hallucinations from 10% to 1%, added order, callback and quote tools, and grew the knowledge base by 300%.
Duration
~5 months (then continuous tuning)
Team Size
1
My Role
End-to-end: architecture, backend, ML/RAG pipeline, governance & tuning

Executive Summary
The pitch for an AI sales assistant is intoxicating: drop a widget on your store, point it at your catalog, and a tireless agent answers every customer 24/7. The demo always works. The first week in production rarely does. We learned this the honest way — by building one for an air-conditioning e-commerce store, shipping it, watching it stumble, and rebuilding the parts that mattered over and over.
The single most important lesson up front: a chatbot's launch-day quality is the floor, not the ceiling. If you judge the project by the first impression, you'll kill it right before it gets good. The actual conversational API was bootstrapped in about a day — making it genuinely useful took five months of systematic work on top of the platform, an indexer-first knowledge pipeline, and six weeks of near-daily tuning.
This case study is the unvarnished version: the timeline, the predictable launch-week failures and how each became a fix, the governance discipline that turned a mediocre first impression into a trusted tool, and the integrations — quote, order, callback — that finally made it move the business. An AI chatbot is not a switch you flip; it is an instrument you tune over months and then keep tuning forever.
Key Metrics
Hallucination rate (first 2 months)
Before
10%
After
1%
AI tools / actions (order · callback · quote)
Before
0
After
3
Knowledge base coverage
Before
Products only
After
+ categories, articles, delivery cities, PDF catalog
Conversation memory
Before
None (stateless)
After
Multi-day Redis memory
Error handling
Before
Raw stack traces
After
Graceful handoff
Out-of-stock recommendations
Before
Surfaced to customers
After
Hard-excluded, in-stock weighted
The Challenges
Key obstacles that needed to be addressed
It answered confidently with the wrong product
Early recommendations surfaced catalog entries that were preorder-only or out of stock. A customer being steered toward something they cannot buy is worse than no answer at all.
Business Impact
Confident-but-wrong recommendations erode trust and actively push customers toward dead-end purchases, costing conversions.
It tripped over real human language
Customers don't type clean queries. They use slang, transliterations, typos, abbreviated brand names, and interchangeable spellings. Out of the box the retriever missed all of it.
Business Impact
A search that fails on the way people actually write returns "no results" on products the store genuinely sells — a silent, invisible loss of sales.
It leaked raw exceptions to customers
When the retrieval pipeline errored, early versions surfaced raw, stack-trace-flavored failures directly into the chat instead of failing gracefully.
Business Impact
Customers seeing the machinery break makes the whole store feel unreliable at the exact moment they are deciding whether to buy.
It tried to do things that were not its job
People asked it to write complaint letters, draft formal requests, and otherwise use it as a free general-purpose AI, while service requests (repairs, maintenance) had no right home.
Business Impact
Off-mission usage burns tokens, dilutes the assistant's purpose, and routes service questions nowhere useful.
It forgot the conversation between messages
The JivoChat widget keeps no conversation history, so without external state every message reached the model as an isolated, disconnected line. But customers talk to an AI like a human — "Привет!" first, then question after question, with follow-ups like "what about the white one?" that only make sense with memory.
Business Impact
A stateless assistant cannot resolve follow-ups or hold a real dialogue — the single most common way a promising conversation falls apart.
Our Solutions
How we tackled the challenges and delivered results
An indexer-first, read-only knowledge layer
The first real AI component was not the chatbot — it was the indexer. Separating the data source from the AI brain meant we could rebuild the AI fearlessly without ever risking the store.
Implementation
A non-intrusive worker polls the catalog database every 30 seconds, detects changes via content hashes, and syncs them into the vector database. It never touches the e-commerce core or admin — it reads in read-only mode and reconciles orphaned records hourly.
The long tuning grind: retrieval quality from real logs
Bootstrapping the conversational API took about a day; making it good took six weeks of near-daily commits driven by reading actual customer conversations, not imagined ones.
Implementation
Built canonical matching for slang/transliteration/typos, a function-and-feature glossary transposed into searchable chunks, and a regression test set of messy real-world queries. Hard-excluded unavailable series and weighted in-stock items higher. Added an answer cache and fast-paths that bypass the heavy retrieval pipeline for greetings and small talk.
Stateful conversational memory (the JivoChat gap)
Because the JivoChat widget stores no history, we gave the assistant its own memory so it can hold a human-style dialogue that opens with "hi" and unfolds one question at a time.
Implementation
Conversation history and a TTL'd answer cache live in Redis with a multi-day memory window; the query rewriter resolves context-dependent follow-ups ("what about the white one?"). Critically, context-dependent one-word confirmations ("yes") are excluded from the cache, because caching them poisons the next customer's session.
Governance: the discipline that turns a bad first impression into a good product
A chatbot is not a feature you ship; it is a system you operate. Five governance practices did more for quality than any model upgrade.
Implementation
Religiously store and review real chat history to drive fixes (while treating logs as sensitive data after a near-miss PII leak); disclose to customers that the assistant is AI; always keep a human escape hatch; maintain a keyword scope guard that refuses off-mission tasks and redirects service requests; and pin vetted, authoritative answers for high-stakes facts (authorized-dealer status, certifications, warranty, delivery, payment) instead of trusting freeform generation. The reason was memorable: asked whether the store was an official Gree dealer, an early version found no exact match on the site and cheerfully sided with the customer — "you're right, they don't actually state they're an official Gree dealer; shall I help you draft a complaint about it?" — hallucinating a high-stakes fact and then offering to file a grievance against its own owner.
Integrations that produce outcomes
A chatbot that only talks is a cost center. The moment it can do things, the economics flip — three integrations moved ours from "helpful FAQ" to part of the sales funnel.
Implementation
Automated quote (commercial-proposal) generation assembles a branded PDF from the conversation using live pricing and the current exchange rate, then drops a download link into the chat. In-chat order placement creates real orders via a machine-to-machine endpoint, tagged by source so revenue from the assistant is measurable. Callback requests capture the phone number, attach the full chat history for context, and create the request.
All solutions successfully implemented and deployed
Results & Impact
Measurable outcomes achieved through our solutions
Hallucinations cut from 10% to 1% over two months
A disciplined, log-driven tuning loop turned a mediocre launch-day floor into a genuinely trustworthy assistant — proving the project should be judged on month three, not week one.
From "helpful FAQ" to part of the sales funnel
Quote, order, and callback integrations remove friction from the transaction itself; orders are tagged by source so the revenue the assistant originates is directly measurable.
Knowledge base grown 300%, tools from 0 to 3
Starting from products and expanding to categories, articles, delivery cities, and the PDF catalog — paired with order, callback, and quote tools — turned a talker into a doer.
No dead ends, no broken screens
Graceful degradation routes any internal failure to a calm human handoff instead of a loop of apologies or a raw error — the bot handles the 80% it is good at and knows the boundary of the other 20%.
Trust and data discipline built in
AI disclosure, a keyword scope guard, pinned authoritative answers for high-stakes facts, and hardened handling of chat logs as sensitive data protect both the customer and the business as regulation catches up.
Project delivered on time and exceeded expectations
Technology Stack
Tools and technologies used to build this solution
Backend
AI / Retrieval
Database
Infrastructure
DevOps
Tools
All technologies were carefully selected to ensure optimal performance, scalability, and maintainability