Building Trustworthy AI for Enhanced Live, Public Experiences
Operational and System Constraints
The platform needed to handle many simultaneous conversations, where each interaction was independent and did not follow a fixed or sequential flow.
Users engage with the system while moving through a physical space. Responses are delivered in real time, without any opportunity for review or moderation. Questions often arrive with limited context, referring to what visitors can see or have just encountered. Some are under-specified or based on assumptions the system cannot safely confirm. Others are based on the history of the conversation, but lack context.
Certain questions repeat and follow familiar themes and patterns. Others only appear once the system is in use and are difficult to anticipate during early testing.Responses still needed to remain aligned with a curated knowledge base. When answering questions, the system learns about the user’s interests and adapts the experience and answers to the interest of the user.
ESL’s role was to design and build a platform that could manage per-user state and control what information the system was allowed to draw from whilst answering questions.
The Challenge: When Real Users Arrive
Early testing can look reassuring. A language model sounds fluent and helpful, and common questions produce sensible responses. But that can change once the system is exposed to real use.
Visitors ask unexpected questions. Some are misleading, some are provocative, while others fall outside the available source material. When a system attempts to answer regardless, it risks producing confident nonsense. In a public, real-time setting, the impact of that behaviour is immediate and difficult to undo.
For ESL, the focus was on enforcing response constraints when the system lacked sufficient, reliable information, evaluating the context and limiting hallucinations.
Every answer was evaluated based on relevance to the query, correctness and completeness, ensuring they are grounded in data from the knowledge base.
The Approach: Grounding Responses in Controlled Knowledge
The system was built around Retrieval-Augmented Generation (RAG). Instead of relying on a model’s internal training data, responses are generated using content retrieved from a controlled, configurable knowledge base. The AI was used to augment these answers, translate them and provide the tone of voice.
This allowed ESL to define clear boundaries around what the system could reference. Domain experts were able to update,refine and add to the source material directly, without retraining the model. Changes could be evaluated and rolled out incrementally, reducing risk as content evolved.
RAG established a necessary foundation, but it did not address every failure mode on its own. Retrieval can return incomplete or poorly matched fragments, and language models may still attempt to generalise beyond what is supported by the source material. Maintaining response quality required additional controls beyond architectural design alone.
Architectural Decisions
A number of architectural choices were made early and carried through the system:
Modular language model layer
The language model was deliberately decoupled from the rest of the platform. ESL avoided tight coupling to a single model or provider, allowing models to be swapped as requirements changed and models evolved, without disrupting the surrounding system or interrupting operation.
Channel-independent delivery layer
The initial experience ran through WhatsApp, a messaging-based interface familiar to visitors, but the core platform was built independently of any single channel. This allowed us to introduce additional interfaces (such as mobile apps) without changes to the underlying logic.
These decisions reduced vendor lock-in, limited operational risk and supported ongoing changes to models, channels, and content without requiring structural rework.
Validation and Testing in Practice
Once the system was in place, attention shifted to how it behaved under real-world conditions. Fluent answers were not enough. Responses needed to stay grounded, consistent, concise and within the limits of the available source material.
Standard testing approaches did not hold up well. Language model behaviour changes with phrasing, context, retrieval quality and LLM versions. Small variations in input can lead to different outputs, even when the underlying information stays the same. One-off checks and static test cases were not reliable indicators of how the system will perform in use.
To address this, ESL built internal tools for testing and validating LLM-driven applications. Responses are evaluated across multiple dimensions, including whether they are grounded in the provided knowledge base, factually accurate, and relevant to the user’s question.
This made it possible to surface failure patterns that would not appear during early testing. Retrieval mismatches, over-generalisation, and subtle hallucinations could be identified and addressed before launching.
The same tooling also supports ongoing checks as the knowledge base , models, and prompts evolve.
Outcome and Impact
The platform delivered a conversational guide suitable for public, real-time use, while remaining constrained to a curated knowledge base. Responses were fast,natural, with an adaptable tone of voice without sacrificing accuracy or control.
For ESL, the work established an approach that extends beyond a single deployment. The same architectural patterns and validation methods can be applied in other public facing environments where reliable responses matter. Uses include chat bots, customer support and self service assistants, recommendation engines, compliance, legal and policy interpretation, and many more.
Ready to start working with us?
Erlang Solutions exists to build transformative solutions for the world’s most ambitious companies, by providing user-focused consultancy, high tech capabilities and diverse communities. Let’s talk about how we can help you.