Success stories

Accelerated Customer Service for Seatfrog

GenAI

Industry

Travel

Company size

<50

Established

2016

Value

>10 million

Location

London

Seatfrog partnered with Tasman to build an RAG support system handling complex UK rail ticket queries. In just six weeks, they achieved 50% automated full resolution, sub-200ms response times, and full technical ownership through a production-grade RAG architecture.

What We Did

Seatfrog faced thousands of monthly support queries about UK rail’s complex ticketing rules—where simple questions like “Can I get a refund?” depend on intricate combinations of ticket types, operators, and delay causes. Their engineering leadership recognised the competitive advantage of AI-powered support but, with a highly capable in-house team focused on revenue-driving priorities, made the decision to entrust delivery to a reliable partner. They chose strategic acceleration: partnering with Tasman to compress delivery to six weeks whilst maintaining complete technical ownership.

We implemented a three-phase RAG architecture with production-grade components:

Vector search: OpenAI text-embedding-3-small with Qdrant database, HNSW indexing for sub-200ms retrieval
Fine-tuned model: GPT-4o-mini trained on existing support tickets with markdown-based semantic chunking
Production features: Automated knowledge updates, LangFuse observability, feedback loops for continuous improvement
Parallel development: Seatfrog engineers maintained decision authority throughout.

The result: production AI achieving 50% query resolution immediately, sub-200ms response times, and high cost effectiveness. Seatfrog’s team brought strong expertise in RAG patterns and strategies, but chose to work with a trusted partner to accelerate delivery while keeping internal focus on revenue-driving priorities. It was a conscious trade-off in pace, not capability, ensuring they retained full control of their AI roadmap.

Where Seatfrog Was

Customer support drowning: Thousands of repetitive monthly queries about rail tickets, seat upgrades, and delay compensation.
Self-service failure: Comprehensive help centre exists but users consistently choose human support – root cause is the UK rail ticketing complexity. For instance, “Can I get a refund?” depends on the ticket type (Advance, Off-Peak, Anytime), operating company, and delay cause (weather, signals, industrial action).
Business need: Intelligent solution that navigates these nuances whilst maintaining high service standards and empathetic mode without sycophantic behaviour that might risk overpromising the business service capabilities.

This Didn't Work

Generic chatbots fail at transport complexity. Simple FAQs are easily automatable, but complex transport questions are stubbornly difficult and are often highly context dependent.
AI hallucination risks are real. Generic systems confidently give wrong answers – claims like “all tickets refundable” or “15-minute delay compensation” can really mess up Seatfrog’s customer relationships. The reality is also that policies vary dramatically by operator and ticket type.
High stakes for errors. Not edge cases but daily queries from people having real issues. Mistakes mean regulatory violations and customer compensation claims risk.
Enterprise solutions inadequate. Intercom/Salesforce: £20,000-100,000 price tags; while still lack crucial rail domain knowledge. This is changing rapidly but Seatfrog was keen to experiment with a self-built solution. Seatfrog’s engineering leadership recognized the strategic opportunity early – not a capability gap but a deliberate choice to partner for rapid deployment whilst maintaining full technical ownership.
Strategic priorities. Engineering team focused on core product roadmap, seeking expert acceleration of AI initiatives whilst building internal capability to own the solution long-term.

But This Is An Opportunity

Existing assets to leverage. Help centre content (though duplicated and incomplete), an archive of well-rated customer support interactions, and a clear standard for quality agent interactions
Modern AI architecture potential. We can transform assets into intelligent assistant, ground responses in authoritative content, and maintain warm, knowledgeable tone.
Strategic approach. Seatfrog’s engineering team chose to accelerate development through expert collaboration – leveraging Tasman’s specialised RAG expertise to compress months of R&D whilst ensuring complete architectural understanding and handover readiness from day one.
Clear business value. Handle 80% of routine queries, free human agents for complex problems, enable 24/7 availability, transform support from bottleneck to competitive advantage, and have direct cost savings: >£10k monthly outsourced support spend.

Architecture

We designed a three-phase technical roadmap that balanced immediate value delivery with long-term scalability. The architecture (illustrated in Figure 1) centres on a retrieval-augmented generation (RAG) pattern that grounds all responses in Seatfrog’s authoritative help centre content. This approach mitigates hallucination risks whilst maintaining conversational quality. The system processes queries through multiple stages:

Data ingestion pipeline: Deep crawl of help.seatfrog.com and all child pages, with explicit logic to avoid certain domains/pages, and automatically extracting meaningful content from HTML and converting into markdown.
Intelligent chunking: Boundary detection based on markdown heading structure (which we ended up selecting after semantic detection was less accurate).
Vector storage: Managed Qdrant database with HNSW indexing for sub-200ms retrieval, using the OpenAI embedding model.
LLM orchestration: GPT-4o-mini for generation with streaming response capability, achieving great results, very cheap and super quickly.
Low-Friction user feedback collection: Capturing whether a generated response was helpful or not, directly into the tracing solution, so that we can iterate and improve the system.
Observability layer: Complete request tracing via LangFuse integration, which question was raised, which chunks were identified as most relevant, their respective score and source URL; as well as feedback scoring, and costs.

Phase 1: Prototype Implementation

The initial prototype focused on proving the core RAG pattern could handle Seatfrog’s domain complexity. As shown in the system diagram, the data flow begins with comprehensive website crawling that preserves the hierarchical structure of help articles. Rather than naive splitting, the chunking algorithm respects semantic boundaries ensuring that related information about ticket types or refund policies stays together. Key technical decisions included:

Embedding model: OpenAI text-embedding-3-small (1536 dimensions) for semantic search.
Context window: No limited, optimally balanced for completeness vs cost.
Retrieval strategy: Top-4 most relevant chunks with similarity scoring using cosine distance methods.
Response streaming: Server-sent events for <200ms time-to-first-token.
Simple UI: Basic chat interface for internal testing and evaluation.

To evaluate the quality and accuracy of generated responses, we worked with the Client Subjet Matter Expert to prepare a curated dataset of questions, the expected answer, and the source URL that should be referenced. This resulted in 12 questions, covering a good variety of use cases, with answers that were concise and feasible to evaluate against the generated responses.

We then implemented a solution for automated testing, leveraging LLM-as-a-Judge approach. We defined unit tests with Pytest, calling a function with parameters: question, expected answer, list of expected sources and generated response. This function was defined with a clear prompt and instructions to provide a structured output consistent with True/False results expected by the assertion tests.

This foundation achieved a 50% success rate on Seatfrog’s test queries, out of the box — this is high, and remarkably close to industry-leading benchmarks whilst using a smaller, more cost-effective model with simple single-turn question and answer. And this is before the fine-tuning and debugging/refactoring existing documentation in the knowledge base.

Fine-Tuning for Domain Excellence

The high performance here came from fine-tuning GPT-4o-mini on Seatfrog’s actual support interactions. The fine-tuning pipeline processed 53 hand-picked tickets through several stages:

Data extraction: Parse Zendesk tickets to extract customer-agent conversations.
PII removal: Automated masking of emails, names, and personal details.
Format conversion: Transform into OpenAI’s required {"role": "user/assistant", “content”: “…”} structure.
Validation: Ensure conversations end with agent responses, fix formatting issues.
Training execution: 5-minute fine-tuning job costing just $0.31.

Interestingly enough, fine-tuning didn’t deliver a huge step-up in tone; we used a supervised method, only providing the correct answer. In a next stage we’d explore more directional methods like Direct Preference Optimisation, where we provide both good and bad responses.

Knowledge Transfer & Engineering Ownership

Seatfrog’s engineering team maintained full visibility and decision authority throughout implementation. This wasn’t outsourcing but strategic acceleration – with Seatfrog engineers:

Reviewing and approving all architectural decisions
Building the production API gateway and authentication layers in parallel
Preparing internal documentation for immediate handover post-prototype
Leading security and compliance reviews

Phase 2: Production Hardening

With Seatfrog’s engineering team ready to assume ownership, the production hardening phase focused on ensuring seamless handover. The production architecture (Figure 2) adds enterprise-grade capabilities whilst maintaining the elegant simplicity of the RAG pattern. Critical enhancements include:

Automated knowledge updates: Weekly crawls (easily increased in frequency or implemented in a CI/CD pipeline) capture help centre changes.
API gateway: FastAPI wrapper with rate limiting and authentication (built by Seatfrog engineering team). Temporary solution is a simplified API (without auth, rate limit) to act as reference for their implementation.
Containerised deployment: Docker images for cloud-agnostic hosting.
Security layers:
- Input validation against prompt injection.
- Output filtering for PII and inappropriate content.
- Topic relevance scoring to prevent off-topic responses.
Multi-turn conversation: Session management with 30-minute timeout.
Continued fine-tuning: Monthly retraining with new support tickets based on bad/good reviews (manual process for the moment using the trace logs and a human in the loop to review bad answers).

In Progress: Agentic Capabilities

The architecture explicitly supports evolution toward Agentic AI patterns emerging as 2025’s dominant approach. The planned Agentic RAG system (Figure 3) adds intelligent routing and autonomous capabilities.

Dynamic routing: Classify queries to choose optimal processing path
Tool integration:
- Zendesk API for ticket creation and history lookup.
- Email systems for delay repay initiation.
- Booking systems for modification requests.
- And an “escape hatch” that allows the user to escalate to a human agent, with the request already contextualised and prepped by the AI Agent.
Self-reflection loop: Queries can be reformulated based on initial results.
Corrective RAG: Multiple retrieval attempts with different strategies.
Memory systems: Both short-term (current conversation) and long-term (history of customer interactions with the Agentic system and Zendesk tickets).

This positions Seatfrog to handle not just information queries but actual service requests—transforming the chatbot from a question-answering system to a true AI agent capable of resolving customer issues end-to-end. The modular architecture ensures each capability can be added incrementally, with rigorous testing at each stage to maintain the high accuracy standards required for production deployment.

Next step – important to understand that the RAG solution becomes another tool in the Agent toolkit we are building. There is potential to use multiple RAG architectures, mix and matching, such as Basic RAG for speed, GraphRAG for increased accuracy. And so, the model could use the GraphRAG response to correct/adjust if needed but still leverage RAG for super quick response.

Conclusion: Strategic Acceleration in Action

Seatfrog’s AI implementation demonstrates how forward-thinking engineering teams leverage expert partnerships to compress innovation timelines without sacrificing ownership or understanding. Within 12 weeks, Seatfrog moved from initial concept to production-ready AI assistant—a timeline that internal development would have stretched to 6-9 months whilst navigating the learning curve of RAG architectures, embedding strategies, and LLM fine-tuning.

Measurable outcomes delivered:

50% query resolution rate achieved immediately, targeting 80% within Q1 2025
Sub-200ms response times maintaining conversation quality
Complete technical documentation and handover readiness from day one
Engineering team fully equipped to iterate and extend the solution independently

The partnership model worked because:

Seatfrog maintained architectural decision authority throughout
Knowledge transfer happened continuously, not as an afterthought
Tasman provided specialist expertise without creating dependency
Both teams focused on capability building, not just delivery

This wasn’t about filling a capability gap—it was Seatfrog’s engineering leadership making a calculated decision to accelerate their AI roadmap whilst maintaining complete control. They now own a production AI system they fully understand, can extend autonomously, and have already begun enhancing with additional agentic capabilities.

For Seatfrog, partnering with Tasman meant getting to market faster without the typical consultant lock-in. For Tasman, it validated our model: deep expertise, rapid delivery, complete handover. No mystery, no dependency, just acceleration.