Industry
Travel
Company size
<50
Established
2016
Value
>10 million
Location
London
Seatfrog partnered with Tasman to build an RAG support system handling complex UK rail ticket queries. In just six weeks, they achieved 50% automated full resolution, sub-200ms response times, and full technical ownership through a production-grade RAG architecture.
Seatfrog faced thousands of monthly support queries about UK rail’s complex ticketing rules—where simple questions like “Can I get a refund?” depend on intricate combinations of ticket types, operators, and delay causes. Their engineering leadership recognised the competitive advantage of AI-powered support but, with a highly capable in-house team focused on revenue-driving priorities, made the decision to entrust delivery to a reliable partner. They chose strategic acceleration: partnering with Tasman to compress delivery to six weeks whilst maintaining complete technical ownership.
We implemented a three-phase RAG architecture with production-grade components:
The result: production AI achieving 50% query resolution immediately, sub-200ms response times, and high cost effectiveness. Seatfrog’s team brought strong expertise in RAG patterns and strategies, but chose to work with a trusted partner to accelerate delivery while keeping internal focus on revenue-driving priorities. It was a conscious trade-off in pace, not capability, ensuring they retained full control of their AI roadmap.
We designed a three-phase technical roadmap that balanced immediate value delivery with long-term scalability. The architecture (illustrated in Figure 1) centres on a retrieval-augmented generation (RAG) pattern that grounds all responses in Seatfrog’s authoritative help centre content. This approach mitigates hallucination risks whilst maintaining conversational quality. The system processes queries through multiple stages:
The initial prototype focused on proving the core RAG pattern could handle Seatfrog’s domain complexity. As shown in the system diagram, the data flow begins with comprehensive website crawling that preserves the hierarchical structure of help articles. Rather than naive splitting, the chunking algorithm respects semantic boundaries ensuring that related information about ticket types or refund policies stays together. Key technical decisions included:
To evaluate the quality and accuracy of generated responses, we worked with the Client Subjet Matter Expert to prepare a curated dataset of questions, the expected answer, and the source URL that should be referenced. This resulted in 12 questions, covering a good variety of use cases, with answers that were concise and feasible to evaluate against the generated responses.
We then implemented a solution for automated testing, leveraging LLM-as-a-Judge approach. We defined unit tests with Pytest, calling a function with parameters: question
, expected answer
, list of expected sources
and generated response
. This function was defined with a clear prompt and instructions to provide a structured output consistent with True/False
results expected by the assertion tests.
This foundation achieved a 50% success rate on Seatfrog’s test queries, out of the box — this is high, and remarkably close to industry-leading benchmarks whilst using a smaller, more cost-effective model with simple single-turn question and answer. And this is before the fine-tuning and debugging/refactoring existing documentation in the knowledge base.
The high performance here came from fine-tuning GPT-4o-mini on Seatfrog’s actual support interactions. The fine-tuning pipeline processed 53 hand-picked tickets through several stages:
{"role": "user/assistant", “content”: “…”}
structure.Interestingly enough, fine-tuning didn’t deliver a huge step-up in tone; we used a supervised method, only providing the correct answer. In a next stage we’d explore more directional methods like Direct Preference Optimisation, where we provide both good and bad responses.
Seatfrog’s engineering team maintained full visibility and decision authority throughout implementation. This wasn’t outsourcing but strategic acceleration – with Seatfrog engineers:
With Seatfrog’s engineering team ready to assume ownership, the production hardening phase focused on ensuring seamless handover. The production architecture (Figure 2) adds enterprise-grade capabilities whilst maintaining the elegant simplicity of the RAG pattern. Critical enhancements include:
The architecture explicitly supports evolution toward Agentic AI patterns emerging as 2025’s dominant approach. The planned Agentic RAG system (Figure 3) adds intelligent routing and autonomous capabilities.
This positions Seatfrog to handle not just information queries but actual service requests—transforming the chatbot from a question-answering system to a true AI agent capable of resolving customer issues end-to-end. The modular architecture ensures each capability can be added incrementally, with rigorous testing at each stage to maintain the high accuracy standards required for production deployment.
Next step – important to understand that the RAG solution becomes another tool in the Agent toolkit we are building. There is potential to use multiple RAG architectures, mix and matching, such as Basic RAG for speed, GraphRAG for increased accuracy. And so, the model could use the GraphRAG response to correct/adjust if needed but still leverage RAG for super quick response.
Seatfrog’s AI implementation demonstrates how forward-thinking engineering teams leverage expert partnerships to compress innovation timelines without sacrificing ownership or understanding. Within 12 weeks, Seatfrog moved from initial concept to production-ready AI assistant—a timeline that internal development would have stretched to 6-9 months whilst navigating the learning curve of RAG architectures, embedding strategies, and LLM fine-tuning.
Measurable outcomes delivered:
The partnership model worked because:
This wasn’t about filling a capability gap—it was Seatfrog’s engineering leadership making a calculated decision to accelerate their AI roadmap whilst maintaining complete control. They now own a production AI system they fully understand, can extend autonomously, and have already begun enhancing with additional agentic capabilities.
For Seatfrog, partnering with Tasman meant getting to market faster without the typical consultant lock-in. For Tasman, it validated our model: deep expertise, rapid delivery, complete handover. No mystery, no dependency, just acceleration.