Read the interview with Alberto Gómez and his winning proposal: Artificial Intelligence and Video Codec Enhancements for Realistic Avatar Telepresence (AIVATAR)

Alberto Gómez and Fluendo

Alberto Gómez is Fluendo’s Business Unit Manager for New Business, with over nine years of experience in product and project development across the multimedia and automotive sectors.
Fluendo, based in Barcelona, is a leading provider of advanced multimedia middleware solutions. With over 20 years of expertise, the company develops high-performance components—including codecs, SDKs, image processing modules, streaming plugins, and AI-powered tools—across multiple platforms. 
Fluendo is a key contributor to the GStreamer ecosystem. It delivers scalable, cross-platform solutions for industries such as automotive, broadcasting, and video surveillance and offers trusted consulting services to clients of all sizes.

Can you give a brief overview of your winning proposal?
What are its key objectives and innovative aspects?

Our proposal, AIVATAR, introduces dual enhancement solutions—codec-based and AI-based—to improve the quality, efficiency, and realism of real-time immersive telepresence. We integrate MPEG-5 Part 2 LCEVC as an enhancement layer over AVC, achieving up to 20% BD-rate reduction, sub-200 ms latency, and full backward compatibility with existing codecs and infrastructure.

In parallel, we deploy lightweight AI-driven enhancements—focused on upsampling, artifact suppression, and color fidelity—at the endpoint. These enhancements run without altering current streaming pipelines or requiring server-side changes.  These innovations form a hybrid solution that adapts to varying network conditions and resource-constrained XR devices, offering a scalable and high-quality telepresence experience.

What motivated you to apply for the SPIRIT Open Call?

Fluendo is strategically targeting the emerging XR market as a new avenue for our solutions. With proven applications in remote work, education, medicine, automotive, and other industries, our expertise is now poised to extend into XR—a field that redefines immersive experiences. SPIRIT offers a unique opportunity to validate and refine our codec-based and AI-based enhancements in real-world conditions.

By leveraging SPIRIT’s advanced testbeds and collaborative environment, we can showcase our innovative approach, paving the way for broader market adoption and future growth in the XR domain.

How do you envision this project making an impact?

AIVATAR aims to transform real-time telepresence by enabling high-quality, low-latency avatar streaming across diverse networks and device conditions. By integrating LCEVC over AVC and deploying AI-driven visual enhancements, the project significantly improves compression efficiency, visual fidelity, and responsiveness—without adding system complexity.

A key innovation is deploying AI at the XR edge, enabling real-time temporal and spatial upsampling, denoising, and colour refinement directly on XR glasses, mobile devices, tablets, and web-based clients. This edge-focused approach ensures scalability, lowers infrastructure demands and delivers consistently immersive experiences across platforms. The project will also generate valuablebenchmarking data and best practices for the XR and multimedia communities. Ultimately, AIVATAR enhances the SPIRIT platform’s value proposition, accelerating the adoption of realistic, scalable telepresence in education, healthcare, remote work, and beyond.