Built production-grade LangChain agent with multi-tier RAG pipeline for a 50K+ subscriber financial publishing company. Replaced expensive JotForm chatbot that repeatedly asked the same questions. Delivered $24K+ annual cost savings, 70% ticket deflection, and zero conversation memory failures—all while handling real money movement through automated refunds.
The Problem
Financial publishing company with 50K+ subscribers and predominantly 55+ demographic faced three critical support failures: JotForm chatbot asked the same question multiple times per conversation (damaging brand credibility), support staff spent 1-2 hours nightly manually triaging inquiries (the task automation was supposed to eliminate), and monthly costs of $500+ while customer satisfaction declined.
Root Cause: Rigid if/then decision trees without conversation memory. Every branch restart triggered same data collection. No semantic understanding, no context retention—just expensive keyword matching disguised as AI.
What I Built
Production-Grade Agent with Multi-Tier Intelligence: Designed and implemented complete customer support automation replacing broken legacy system with intelligent, context-aware agent handling real transactions.
- Model-Agnostic Orchestration: Built provider abstraction supporting 5 backends (OpenAI, DeepSeek, Anthropic, Together AI, WebLLM). Hot-swappable routing enables cost optimization (swapped OpenAI → DeepSeek in 48 hours for 90% cost reduction) and compliance options (WebLLM for zero data exfiltration).
- Three-Tier RAG Pipeline: Cache layer (cosine similarity >0.95, instant response, $0 cost), knowledge base vector search (16 curated entries, >0.75 similarity, $0.0001/query), and LLM generation (full context, falls through only when needed). System learns and gets cheaper—70% queries hit free tier by month 2.
- Skill-Based Architecture: 20 discrete, independently testable skills including refund workflow (Recurly API integration, real money movement), cancellation processing (safety confirmations), password reset (WordPress integration), bonus report access (semantic search across 20+ products), and human escalation (full transcript handoff).
- Zero Memory Failures: Multi-tier context retention (conversation memory, entity extraction, intent history) ensuring zero repeated questions. Adaptive data collection only asks what it doesn't know.
Technical Architecture
Platform Stack: Next.js 16, TypeScript (12,000+ lines, 70+ files), PostgreSQL with pgvector extension (8 tables, 1536-dimensional embeddings), Drizzle ORM with type-safe migrations, Vercel Edge deployment.
LangChain Integration: TypeScript SDK with structured output parsing (Zod schemas), multi-step reasoning chains, conversation memory management, RAG retrieval with similarity scoring, model provider abstraction (OpenAI, Anthropic, DeepSeek, Together AI, WebLLM).
Recurly Integration: Official SDK (not REST wrapper), customer verification (email + name + last4 matching), subscription status checking, automated refund processing with eligibility rules, account note logging (audit trail), chargeback detection (blocks fraud attempts).
Security & Governance: SQL injection detection, prompt injection detection, credential harvesting prevention, PII sanitization (SSN, credit cards redacted from logs), rate limiting (abuse prevention), audit trail (every action logged for compliance), admin review queue (flags suspicious patterns).
Intelligent Refund Workflow
Automated Eligibility Detection: Account verification via Recurly API, purchase date within 90 days (enforced), purchase amount <$500 (auto-approved), no prior chargebacks (fraud prevention), active subscription verification.
Adaptive Collection: Only asks what it doesn't know (if email collected, never asks again). Safety confirmation required: "⚠️ You'll receive $299 back, subscription cancelled immediately, access revoked, action cannot be undone. Reply YES to confirm."
Post-Processing: Refund processed in Recurly (real money movement), subscription cancelled, email sent to customer (receipt), support team notified, sales director alerted (high-value), admin dashboard tagged, audit log created (compliance).
Three-Tier RAG Pipeline Design
Tier 1 - Semantic Cache (Free): PostgreSQL + pgvector, cosine similarity >0.95 = instant cached response, zero API cost, <100ms latency. Automatically populated from high-confidence interactions.
Tier 2 - Knowledge Base (Cheap): 16 curated Q&A entries with embeddings, similarity >0.75 triggers retrieval, ~$0.0001 per query (embedding-only cost), admin-managed through dashboard.
Tier 3 - LLM Generation (Moderate): Falls through only when tiers 1-2 fail, full conversation context + RAG context, logs successful resolutions for tier-1 promotion, system learns and gets cheaper.
Cost Impact: Month 1: 30% tier-1, 40% tier-2, 30% tier-3 ($8.50 total). Month 2: 70% tier-1, 20% tier-2, 10% tier-3 ($2.52 total). System that learns and gets cheaper, not more expensive.
Key Product Decisions
Model-Agnostic vs. OpenAI-Only: Built provider abstraction from day one. Tradeoff: Additional complexity. Outcome: Swapped to DeepSeek in 48 hours for 90% cost reduction when it launched.
Three-Tier RAG vs. Naive LLM: Engineered caching layer before vector search. Tradeoff: More architectural complexity. Outcome: 70% queries free by month 2, system gets cheaper over time.
Skill-Based vs. Monolithic: Decomposed agent into 20 discrete skills. Tradeoff: More files to maintain. Outcome: Independent testing, versioning, deployment of capabilities.
Pre-Response Reasoning vs. Direct Reply: Added reasoning step before every action. Tradeoff: 200-500ms latency increase. Outcome: Prevented catastrophic errors (cancelling when customer wanted help).
Development & Delivery
Week 1-2: Discovery (Strong Start Kickoff, service blueprints, success metrics), architecture design, demo-ready prototype with password reset and intent recognition working.
Week 3-4: Core workflows (refund processing with Recurly, bonus report access, human escalation), second demo day with stakeholder feedback, priority adjustments based on usage.
Week 5-6: Production polish (comprehensive documentation, CI/CD pipeline, security implementation, rate limiting), admin dashboard (analytics, session viewer, knowledge base manager), deployment and monitoring.
Measurable Outcomes
- Financial Impact: $24,600 annual savings (JotForm elimination $500/mo, Tier-1 deflection $1,300/mo, manual triage elimination $250/mo) vs. costs of ~$200/month (API + infrastructure)
- Operational Efficiency: 70% ticket deflection rate, zero repeated questions, sub-10 second response time, 24/7 availability, 92%+ intent accuracy
- Technical Delivery: 12,000+ lines TypeScript across 70+ files, 8 database tables with vector support, 16 knowledge base entries, 20 specialized skills, 5 model provider integrations, 15+ technical documentation files
- Production Reliability: 99%+ uptime since deployment, graceful degradation (database fallback, provider failover), comprehensive audit trail, admin monitoring dashboard
- Development Velocity: 6-week delivery from kickoff to production, demo-driven development (stakeholder feedback every 2 weeks), 100+ commits with version control
Admin Dashboard & Observability
Analytics Dashboard: Total conversations, auto-resolution rate (70%+), escalation rate, cache hit rate (compute savings visualization), top 10 frequent questions, real-time intent distribution, time-range filters.
Session Viewer: Full conversation transcripts, two-array tagging system (context + outcome tags), admin review workflow (quality monitoring), refund status tracking, search and filter capabilities, CSV export for financial reporting.
Knowledge Base Manager: Add/edit/delete Q&A entries, usage tracking (which FAQs actually get used), category organization, embedding regeneration when content updated.
What This Demonstrates
Built production LangChain agent handling real financial transactions. Designed multi-tier RAG pipeline reducing costs 70% through intelligent caching. Implemented model-agnostic architecture enabling provider swaps in 48 hours. Created skill-based system with 20 independently testable capabilities. Shipped in 6 weeks with demo-driven development and stakeholder alignment. Achieved 70% ticket deflection with 92%+ intent accuracy and zero memory failures.
Demonstrates: LangChain production expertise, RAG pipeline design, cost optimization through intelligent tiering, platform engineering with extensible architecture, security and governance for financial services, cross-functional stakeholder management, and measurable business outcomes.