Conversational AI & RAG Engine for a Legacy E-Commerce Platform
Wrapped a legacy PHP/OpenCart HVAC store in a standalone, retrieval-grounded AI sales assistant: 1,000+ catalog entities indexed, multi-unit system configuration, grounded quotes and order creation — 24/7, with a confidence-gated human fallback.
Duration
Ongoing engagement (build + continuous tuning)
Team Size
1
My Role
End-to-end: architecture, backend, ML/RAG pipeline, infrastructure & data engineering

Executive Summary
Our client operates a long-running, high-traffic online store built on a legacy PHP/OpenCart platform. The catalog is large and technical — hundreds of products with dozens of structured specifications each, multiple equipment families, accessories, and a body of static knowledge covering delivery, warranty, payment, and installation policies. The existing on-site chat was a manually-staffed widget: every catalog question required a human operator, customers dropped off outside working hours, answers were inconsistent, and the highest-value conversations — multi-unit system configurations — were exactly the ones most likely to stall before a quote was produced.
The hard constraint was that we could not rebuild the store. The legacy platform is the system of record for products, pricing, stock, and orders, and had to remain untouched and authoritative. The AI layer had to wrap around it, not replace it.
We designed and built a standalone AI microservice — provisioned on its own dedicated server with its own deployment pipeline, observability, and scaling — that turns the legacy catalog into a conversational, retrieval-grounded assistant. Every answer is grounded in real catalog data with a calibrated confidence gate that escalates rather than hallucinates. The result is a 24/7 automated pre-sales consultant that understands the full catalog, configures multi-unit systems, generates quotes, and creates orders, with a graceful human-operator fallback.
Key Metrics
Average response time
Before
300 sec
After
11 sec
Lead-to-quote (commercial proposal) time
Before
30 min
After
2 min
Qualified leads / quotes issued
Before
Baseline
After
+27%
Order placement time
Before
5 min
After
1–1.5 min
Off-target operator load (peak season)
Before
Baseline
After
-48%
Queries resolved without escalation
Before
0
After
160+
Pre-sales availability
Before
Business hours, human-staffed
After
24/7 automated
Multi-unit configuration
Before
Operator-only, manual
After
Self-serve, end-to-end
The Challenges
Key obstacles that needed to be addressed
Cost & latency of a fully human-staffed chat
Operators answered the same catalog questions — fit for a room size, stock status, delivery cost, multi-room configuration — thousands of times over. Outside working hours customers simply waited or dropped off.
Business Impact
Expensive operator time spent on repetitive pre-sales, and lost conversions every hour the team was offline.
Inconsistent, error-prone answers
Responses varied by operator. Technical specifications and live availability were easy to get wrong when answered from memory rather than the source catalog.
Business Impact
Eroded customer trust and created quoting errors on a catalog where specs and stock change constantly.
High-value configurations stalled in the funnel
Complex multi-unit systems (one outdoor unit serving several rooms) were the most operator-dependent conversations — and the most likely to stall before a quote was ever produced.
Business Impact
The largest, most profitable orders were the ones most frequently lost to friction and slow turnaround.
A legacy core that could not be rebuilt
The PHP/OpenCart monolith is the authoritative system of record for products, pricing, stock, and orders. It had to remain untouched — the AI had to wrap around it, not replace it.
Business Impact
Any solution had to add intelligence with zero intrusion into a proven, revenue-critical platform.
Our Solutions
How we tackled the challenges and delivered results
A purpose-built inference server, provisioned from scratch
Rather than bolt an AI script onto the existing host, we provisioned a separate production server dedicated to inference, indexing, and the conversational API — isolating the AI workload from the storefront and keeping latency-sensitive operations off the legacy box.
Implementation
Fastify (Node.js) service under systemd, fronted by HTTPS, with health/liveness probes that verify the vector store, cache, and model providers on every check. Every external call (LLM, vector DB, reranker) is wrapped in a timeout plus exponential-backoff retry with jitter, so a hung upstream fails fast instead of stalling a customer. GitLab CI/CD deploys behind a post-deploy health gate, with an automated test suite (27+ tests) gating refactors.
A multi-stage Retrieval-Augmented Generation pipeline
Each customer message flows through a multi-stage RAG pipeline engineered to ground every answer in real catalog data and to escalate rather than guess when confidence is low.
Implementation
Intent classification routes and short-circuits cheap cases; a fast model rewrites context-dependent messages ("and the 9k one?") into standalone catalog queries; hybrid dense+sparse retrieval against Pinecone pulls the top 50 of 1,000+ entities; Cohere neural reranking narrows 50→10 with a calibrated confidence gate that refuses thin-air answers; a frontier LLM composes the reply strictly from retrieved context with inline source attribution; Redis holds conversation memory and a TTL'd answer cache for fast follow-ups.
Bridging the legacy catalog into the vector store
A dedicated data-sync indexer continuously projects the legacy MySQL catalog — products, categories, and static knowledge pages — into the vector index, each enriched with structured retrieval metadata.
Implementation
Normalizes and cleans source content (stripping legacy CMS markup so the model reads clean prose), maps raw catalog fields into retrieval-ready metadata (price, stock, capacity, coverage area, inverter/refrigerant flags, model codes, archive status) used for filtering and ranking, and runs incrementally via content-hash change detection for near-real-time freshness, with full-rebuild capability for schema changes.
Multi-unit system configurator
The standout capability: assembling multi-split systems where one outdoor unit serves several rooms — historically the hardest, most operator-dependent conversation in the funnel.
Implementation
Ingested the full technical catalog — outdoor-unit capacities, allowed indoor-unit combinations, and compatibility tables — so the assistant can take "three rooms, one outdoor unit, no ceiling mount in two of them" and return a complete, validated configuration: the right outdoor unit and correct indoor units by mounting type (wall, cassette, ducted, console, floor/ceiling), each with a real price and availability.
Order creation & automated quote generation
The assistant turns conversations directly into transactions — assembling the cart, capturing customer details, generating formatted commercial proposals for complex multi-room projects, and logging the order without a manager ever touching it.
Implementation
For quotes, the bot forms an itemized commercial proposal via API — units, capacities, pricing, totals — and delivers it to the customer right in the chat, compressing a multi-day, manager-dependent cycle into minutes. For orders, the customer answers a couple of questions and the bot pushes a structured request into the store's order system through controlled write-backs (never modifying the core), notifying both managers and the customer. To avoid errors it deliberately does not take payment — the order is recorded as a request, with payment collected after confirmation.
All solutions successfully implemented and deployed
Results & Impact
Measurable outcomes achieved through our solutions
24/7 automated pre-sales across the full catalog
Instant, consistent, source-grounded answers with no operator in the loop for routine inquiries — at any hour.
The hardest conversation, automated end-to-end
Multi-unit system configuration — historically the most operator-intensive, highest-value flow — is now self-serve from recommendation to quote.
From chat directly to order
Order creation and commercial-proposal generation turn conversations into transactions instead of handing off to manual entry.
Operator load reduced
Routine catalog, spec, availability, and logistics questions are deflected from the human team, freeing operators for genuinely complex cases.
Hallucination-resistant by design
A calibrated confidence gate plus strictly grounded generation mean the assistant escalates rather than fabricating a price or availability — and a total upstream outage degrades to a friendly handoff, never a 500.
Project delivered on time and exceeded expectations
Technology Stack
Tools and technologies used to build this solution
Backend
AI / Retrieval
Database
Infrastructure
DevOps
Tools
All technologies were carefully selected to ensure optimal performance, scalability, and maintainability