Trustworthy AI guide for MUSE

Erlang Solutions (ESL) helped develop UranIA, the AI virtual assistant launched at MUSE. Built for real-time interaction, it enhances the visitor experience while relying on a curated knowledge base validated by the museum’s scientific staff.

trustworthy AI guide for museum

Museums, exhibition spaces and public venues are under pressure to extend the visit to a personalised, on-site digital experience. Visitors bring inquisitiveness and curiosity, ask questions and expect immediate answers. They expect them in their language of choice, without downloading an app, scanning QR codes or searching for signage. 

At MUSE – Museo delle Scienze in Trento, one of Italy’s leading science museums, this need resulted in the deployment of a controlled, real-time AI system within a high-trust scientific environment.

Public-facing AI introduces a specific risk in these settings. Large language models (LLMs) can hallucinate in a convincing (and authoritative) manner, making up answers in environments where accuracy and correctness is expected.

 

Erlang Solutions (ESL) was asked to support the development of UranIA, the AI-driven virtual assistant launched at MUSE, designed to operate in real time and enhance the visitor experience while remaining strictly grounded in a controlled and curated knowledge base validated by the museum’s scientific staff. The initiative was delivered in collaboration with MUSE and EBITmax, combining institutional scientific oversight with real-time AI engineering expertise.

Operational and System Constraints

UranIA needed to handle many simultaneous conversations, where each interaction was independent and did not follow a fixed or sequential flow.

Users engage with the system while moving through a physical space. Responses are delivered in real time, without any opportunity for review or moderation. Questions often arrive with limited context, referring to nearby exhibits or recent interactions. Some are under-specified or based on assumptions the system cannot safely confirm. Others depend on earlier exchanges but omit explicit context.

Certain questions repeat and follow familiar patterns. Others only emerge once the system is in live use and are difficult to anticipate during early testing. Responses still needed to remain aligned with a curated knowledge base. 

For UranIA, the Knowledge Base was structured around ten thematic areas representative of the museum and certified and validated by the museum’s scientific staff.

When answering questions, the system needed to adapt to individual interests while remaining within defined informational boundaries.

ESL’s role was to design and build a platform capable of managing per-user state and strictly controlling what information the system could draw upon during response generation. This required cooperation with MUSE and EBITmax to define the initial concept and characteristics of UranIA, establish the data and information assembly chain for the Knowledge Base, and carry out benchmarking and the validation of security and ethical principles.

The Challenge: When Real Users Arrive

Early testing can appear reassuring. A language model may produce fluent responses, and common questions may generate sensible outputs. However, behaviour can change once the system is exposed to live public use.

Visitors ask unexpected questions. Some fall outside the available source material. Others are ambiguous or based on incorrect assumptions. If the system attempts to answer without sufficient grounding, it risks generating responses that are not supported by the validated content. In a real-time public setting such as MUSE, where scientific credibility is central to institutional trust, such behaviour carries immediate risk.

For ESL, the focus was on enforcing response constraints when the system lacked sufficient, reliable information, evaluating the context and limiting hallucinations.

Every answer was evaluated based on relevance to the query, correctness and completeness, ensuring they are grounded in data from the knowledge base. 

Continuous benchmarking and validation were used to prevent linguistic drift or hallucinations. If the AI does not have verified information on a particular topic, UranIA suggests that visitors contact museum staff.

The Approach: Grounding Responses in Controlled Knowledge

UranIA was built around Retrieval-Augmented Generation (RAG), generating responses from a certified Knowledge Base rather than relying on a model’s internal training data.

The Knowledge Base is structured around ten thematic areas representative of the museum and certified and validated by its scientific staff.

This allowed domain experts to update, refine and add to the source material directly, without retraining the model. Changes could be evaluated and rolled out incrementally, reducing risk as content evolved.

RAG established a necessary foundation, but it did not address every failure mode on its own. Retrieval can return incomplete or poorly matched fragments, and language models may still attempt to generalise beyond what is supported by the source material. Maintaining response quality required additional controls beyond architectural design alone.

Architectural Decisions

A number of architectural choices were made early and carried through the system:

Modular language model layer

The language model was deliberately decoupled from the rest of the platform. ESL avoided tight coupling to a single model or provider, allowing models to be swapped as requirements changed and models evolved, without disrupting the surrounding system or interrupting operation.

This ensured that UranIA could adapt to future model developments without structural redesign.

Channel-independent delivery layer

The initial experience ran through WhatsApp, a messaging-based interface familiar to visitors, but the core platform was built independently of any single channel. This supported MUSE’s app-less access model, allowing visitors to interact with UranIA without downloading or installing additional software. Additional interfaces (such as mobile applications) can be introduced without changes to the underlying logic. 

These decisions reduced vendor lock-in, limited operational risk and supported ongoing changes to models, channels, and content without requiring structural rework.

Validation and Testing in Practice

Once the system was in place, attention shifted to how it behaved under real-world conditions. Fluent answers were not enough. Responses needed to stay grounded, consistent, concise and within the limits of the available source material.

Standard testing approaches did not hold up well. Language model behaviour changes with phrasing, context, retrieval quality and LLM versions. Small variations in input can lead to different outputs, even when the underlying information stays the same. One-off checks and static test cases were not reliable indicators of how the system will perform in use.

 

To address this, ESL built internal tools for testing and validating LLM-driven applications. Responses are evaluated across multiple dimensions, including whether they are grounded in the provided knowledge base, factually accurate, and relevant to the user’s question. For the deployment of UranIA, this validation process was carried out in coordination with MUSE and EBITmax to ensure alignment with the museum’s scientific standards and agreed security and ethical requirements.

This made it possible to surface failure patterns that would not appear during early testing. Retrieval mismatches, over-generalisation, and subtle hallucinations could be identified and addressed before launching.

The same tooling also supports ongoing checks as the knowledge base, models, and prompts evolve.

Outcome and Impact

UranIA was delivered as a real-time conversational guide constrained to a curated and validated Knowledge Base, operating in a live public environment without compromising accuracy or control.

The project established a structured approach to combining Retrieval-Augmented Generation, validation processes and architectural decoupling in a public-facing setting.

For ESL, this framework can be extended to other environments where response reliability is essential, including customer support, compliance and self-service systems.

Ready to start working with us?

Erlang Solutions exists to build transformative solutions for the world’s most ambitious companies, by providing user-focused consultancy, high tech capabilities and diverse communities. Let’s talk about how we can help you.