Scientific publications

Discover all SPIRIT-related publications here or visit the SPIRIT community.

WebRTC-Based Volumetric Video Conferencing: SFU Architecture Evaluation and Benchmarking

by Matthias De Fré, Jeroen van der Hooft, Jack Jansen, Silvia Rossi, Thomas Röggla, Tim Wauters, Filip De Turck, Irene Viola, Pablo Cesar. NOSSDAV 2026.

Immersive technologies promise to revolutionize remote communication through enhanced sense of presence and interactivity. To enable interactive experiences, reliable lowlatency transport mechanisms are needed to handle the large volumes of data created by complex 3D objects and scenes. In this paper, we propose an open-source, codec-independent, selective forwarding unit (SFU) for real-time volumetric video streaming using WebRTC. For evaluation purposes, we provide a reference client implementation by extending VR2Gather, a TCP-based system for immersive telecommunication. We conduct extensive evaluations using both newand existing datasets to compare the performance of WebRTC against the existing TCP-based protocols under diverse configurations in an emulated testbed environment. The evaluations
demonstrate thatWebRTC outperforms other protocols in high-latency scenarios and adapts video quality to user movement 13% and 36% faster than its TCP-based counterparts in networks with 5 ms and 10 ms of network latency, respectively.

Distributed WebRTC-Based Forwarding for Scalable Volumetric Video Streaming

by Matthias De Fré, Casper Haems, Jeroen van der Hooft, Tim Wauters, Filip De Turck. NOSSDAV 2026.

As immersive media becomes increasingly accessible, virtual counterparts to real-world experiences such as concerts and conferences have emerged, enabled by volumetric streaming pipelines for virtual reality (VR). User representation is central to these systems, with point clouds widely adopted for realistic avatars due to their balance between quality and performance. However, most existing systems struggle to scale to larger user counts. While recent work has proposed more scalable architectures, these typically focus on computational optimizations and fail to scale to high user counts due to network bottlenecks. To address this gap, we present an open-source, modular, distributed WebRTC-based volumetric streaming pipeline that employs multiple selective forwarding units (SFUs) to improve latency, throughput, and visual quality compared to a centralized SFU. Results show that, with 64 users, the distributed architecture receives 147% more points while maintaining comparable transport latency, and reduces peak latency at lower user counts. Furthermore, leveraging quality adaptation enables stable transport latency across scales, achieving approximately 25 ms.

Eye-Tracking, Quality Assessment, and QoE Prediction Models for Point Cloud Videos: Extended Analysis of the ComPEQ-MR Dataset

by Shivi Vats, Minh Nguyen, Christian Timmerer and
Hermann Hellwagner. IEEE Access.

Point cloud videos, also termed dynamic point clouds (DPCs), have the potential to provide immersive experiences with six degrees of freedom (6DoF). However, there are still several open issues in understanding the Quality of Experience (QoE) and visual attention of end users while experiencing 6DoF volumetric videos. For instance, the quality impact of compressing DPCs, which requires a significant
amount of both time and computational resources, needs further investigation. Also, QoE prediction models for DPCs in 6DoF have rarely been developed due to the lack of visual quality databases. Furthermore, visual attention in 6DoF is hardly explored, which impedes research into more sophisticated approaches for adaptive streaming of DPCs. In this paper, we review and analyze in detail the open-source Compressed Point cloud dataset with Eye-tracking and Quality assessment inMixed Reality (ComPEQ–MR). The dataset, initially presented in [24], comprises 4 uncompressed (raw) DPCs as well as compressed versions processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and 2 GPCC variants). The dataset includes eye-tracking data of 41 study participants watching the raw DPCs with 6DoF, yielding 164 visual attention maps. We analyze this data and present head and gaze movement results here. The dataset also includes results from subjective tests conducted to assess the quality of the DPCs, each both uncompressed and compressed with 12 levels of distortion, resulting in 2132 quality scores. This work presents the QoE performance results of the compression techniques, the factors with significant impact on participant ratings, and the correlation of the objective Peak Signal-to-Noise Ratio (PSNR) metrics with Mean Opinion Scores (MOS). The results indicate superior performance of the VPCC codec as well as significant variations in quality ratings based on codec choice, bitrate, and quality/distortion level, providing insights for optimizing point cloud video compression in MR applications. Finally, making use of the subjective scores, we trained and evaluated models for QoE prediction for DPCs compressed using the pertinent MPEG tools.We present the models and their prediction results, noting that the fine-tuned ITU-T P.1203 models exhibit good correlation with the subjective ratings. The dataset is available at https://ftp.itec.aau.at/datasets/ComPEQ-MR/.

GreenWise: Intelligent Application Migration for Containerized Machine Learning Services in the Computing Continuum

by Peini Liu, José Santos, Filip De Turck, Jordi Guitart. IEEE/ACM International Conference on Utility and Cloud Computing (UCC2025).

Machine Learning (ML) services require both responsiveness and
efficiency. In the dynamic Computing Continuum (CC), service
migration has emerged as an important strategy to optimize service
performance, improve power efficiency, or reduce operational costs.
A basis enabler of the service migration in the CC is containerization
and container orchestration, however, dynamically moving the service among Cloud and Edge servers based on several uncertainties is still challenging. In this paper, we present GreenWise, a framework for intelligent service migration for containerized ML services in the heterogeneous CC. GreenWise extends the monitoring agents to continuously monitor power consumption and performance metrics across diverse layers and sources in real-time, providing a holistic view of states. Leveraging Reinforcement Learning (RL), it enables Power-Performance aware strategies to achieve near-optimal online migration decisions under dynamic conditions. We perform GreenWise on top of a Kubernetes-based CC platform, implementing agents for intelligent migration for containerized ML services. Experimental results demonstrate that GreenWise achieves effective trade-offs between performance and power efficiency. Our proposed Power-Latency-Aware (MaskPPO) migration strategy outperforms Random and Round-Robin baselines 159.2% and 13.8% in a real cluster. These results highlight GreenWise’s potential for sustainable and intelligent ML service migration in the dynamic CC.

Scalable MDC-Based WebRTC Streaming for One-to-Many
Volumetric Video Conferencing

by Matthias De Fré, Jeroen van der Hooft, Tim Wauters, Filip De Turck.
ACM Transactions on Multimedia Computing, Communications, and Applications 2025.

Video consumption has become central to modern life, with users seeking more immersive experiences such as virtual conferencing or concerts within virtual reality (VR). While 360° video offers rotational movement, it lacks true positional freedom. Fully immersive formats like light fields and volumetric video enable six degrees-of freedom (6DoF), allowing both types of freedom. However, their high bandwidth and computational demands make them impractical for low-latency applications. Efforts to address these issues through compression and quality adaptation have improved quality of experience (QoE), but real-time interaction remains limited because of latency. To solve this, we introduce a novel, open-source one-to-many streaming architecture using point cloud-based volumetric video. By compressing point clouds with the Draco codec and transmitting via web real-time communication (WebRTC), we achieve low-latency 6DoF streaming. Content is adapted by employing a multiple description coding (MDC) strategy which combines sampled point cloud descriptions using the estimated bandwidth returned by the Google congestion control (GCC) algorithm. MDC encoding
scales more easily to a larger number of users compared to individual encoding. Our proposed solution achieves similar real-time latency for both three and eight clients (163 ms and 166 ms), which is 9% and 19% lower compared to individual encoding. The MDC-based approach, using three workers, achieves similar visual quality compared to a per client encoding solution using five worker threads, and increased quality when the number of clients is greater than 20. Additionally, when compared to an approach with five fixed quality levels, our MDC-based approach scores 13% better in terms of latency, while achieving similar quality.

Demonstration of Viewport-Aware Hybrid Broadcast-Unicast Streaming for Volumetric Video

by Casper Haems, Matthias De Fré, Tim Wauters, Filip De Turck.
16th International Conference on Network of the Future.

Real-time streaming of volumetric video requires low latency and high bandwidth, making it hard to scale efficiently with unicast delivery alone. This work presents a multi-path transport framework that combines broadcast and unicast to enable scalable, bandwidth-efficient delivery of interactive volumetric content. A base layer is transmitted via broadcast using File Delivery over Unidirectional Transport, while enhancement layers are fetched on demand via Dynamic Adaptive Streaming over HTTP based on the viewer’s viewport. The approach reduces redundant transmissions and enables fine grained adaptation without compromising interactivity. An end-to- end implementation is demonstrated with real-time viewport tracking, dynamic quality switching, and live monitoring under varying bandwidth constraints, improving reliability, achieving
sub-40 ms latency and reducing per-client unicast load by 18 Mbps (15%) per object at the highest quality.

Towards Efficient Transport for Real-Time
Immersive Applications over Hybrid Networks

by Casper Haems, Matthias De Fré, Tim Wauters, Filip De Turck.
16th International Conference on Network of the Future.

Immersive telepresence demands high data rates and low latency, yet no single commercial data path reliably meets these needs. Fine-grained content selection also remains underdeveloped. This work proposes a hybrid, multi-path delivery framework combining broadcast and unicast into a single service. A lightweight base scene is broadcast via File Delivery over Unidirectional Transport (FLUTE), ensuring no viewer ever sees a fully blank scene, while viewer-specific enhancements are steered over unicast. An open-source testbed is released to investigate the impact of network impairments, instrument common protocols, and enable reproducible experiments. On high-quality volumetric video (up to 100k points per frame at 30 frames per second), the hybrid design (i) keeps latency below 40 ms while scaling quality with unicast bandwidth, (ii) reduces server and network load compared to pure unicast, and (iii) masks typical wireless loss patterns with only 15% Forward Error Correction (FEC) overhead. These findings show that treating broadcast and unicast as complementary channels is crucial for scalable Extended Reality (XR) services.

STEP-MR: A Subjective Testing and Eye-Tracking Platform for Dynamic Point Clouds in Mixed Reality

by Shivi Vats, Christian Timmerer, Hermann Hellwagner.
22nd EuroXR International Conference 2025.

STEP-MR builds upon our previous platform (Vats et al., 2023), improving on the subjective testing functionality and adding the ability to conduct eye-tracking tests along with heatmap generation. The platform continues to use Unity1 and the Mixed Reality Toolkit (MRTK) 22 framework from Microsoft as its basis. Since the first version is already explained in the previous publication, this paper focuses majorly on the changes made to the platform since.

A Multidimensional Media Adaptation Framework
for Live Holographic Communication

by Carl Udora, Peng Qian, Jingxuan Men and Ning Wang. IEEE Transactions on Multimedia.

The rapidly increasing popularity of immersive multimedia services such as live holographic communication represents the future trend of extended reality (XR) applications. However, the realization of such immersive and interactive experiences is limited by the lack of fundamental understanding of how different user behaviours and environmental factors jointly affect the overall quality of experience (QoE). In particular, compared with the media adaptation mechanisms applied in conventional video applications, considerably more independent factors may influence user QoE in these applications, including both human- and network-related factors. In this paper, we investigate the fundamental design principles of dynamic media adaptation methods for live holographic communication by holistically considering these factors. Specifically, a machine learning based scheme is introduced to facilitate intelligent adaptation of both the frame quality (resolution level) and frame rate according to specific contexts, such as the user intent/behaviour (including object motion patterns and user movements) and real-time network conditions. Extensive real-world experiments are conducted to assess the feasibility and performance of the proposed method, and comparisons with state-of-the-art methods are performed. The results indicate that the proposed approach can effectively satisfy user intent, with increased user QoE.

Enabling Generative AI based Multi-sensory XR
Applications with Mobile Edge Computing

by Peng Qian, Carlos Velez Redondo, Ning Wang, Carl Udora, JingXuan Men, Rahim Tafazolli. IEEE INFOCOM 2025.

With the rapid development of XR devices, holographic applications are expanding across various domains. However, it is a consensus that capturing and transmitting real-time holographic content still requires significant bandwidth. Even with the enhanced wireless
capabilities of mobile systems, they still fall short of meeting the bandwidth and latency demands required for near-Gigabit per second interactive application scenarios. This paper proposes a network architecture that leverages MEC to address these challenges with the assistant of Generative AI. In this framework, the MEC server can leverage the power of the generative AI model to generate holographic objects with the input of user semantic commands, instead of requiring end-users to capture and transmit large raw
holographic data. This approach significantly reduces uplink bandwidth requirements while enabling efficient real-time content generation. To validate this approach, we design an interactive and multisensory operational training scenario relying solely on semantic uplink transmissions from the end-users. The preliminary results based on the testbed implemented highlight the feasibility of deploying diverse holographic applications in wireless environments.

Enabling Haptic-Integrated Interactive Holographic
Video Streaming Powered by 5G Edge Computing

by Peng Qian, Ning Wang, Carl Udora, Carlos Velez Redondo, JingXuan Men, Rahim Tafazolli. IEEE ICME 2025.

Driven by the rapid advancement of XR technologies, an increasing demand has emerged for enabling interactive holographic real-time applications of multiple users. However, in wireless systems, the challenges of practically delivering a satisfactory experience for such applications arise not only from the high bandwidth and computational demands of holographic content but also from the need to effectively integrate and transmit additional sensory data, such as haptic feedback, to convey accurate and human-perceivable
semantics. In this paper, we propose an architecture that leverages the computational capabilities of 5G edge computing to collaborate with multi-user haptic-integrated holographic video applications. We detail the components of this solution architecture, including haptic-integrated holographic video capture and streaming, and a simple reinforcement learning algorithm to synchronise multiuser frames at the MEC server. A prototype of a representative hand-touch social interaction application was designed, implemented, and measured to
exemplify the potential of human-perceivable, haptic-integrated holographic video applications.

Animatable Virtual Humans: Learning pose-dependent human
representations in UV space for interactive performance synthesis

by Wieland Morgenstern, Milena T. Bagdasarian, Anna Hilsmann, Peter Eisert. IEEE Transactions on Visualization and Computer Graphics.

We propose a novel representation of virtual humans for highly realistic real-time animation and rendering in 3D applications.
We learn pose dependent appearance and geometry from highly accurate dynamic mesh sequences obtained from state-of-theart multiview-video reconstruction. Learning pose-dependent appearance and geometry from mesh sequences poses significant
challenges, as it requires the network to learn the intricate shape and articulated motion of a human body. However, statistical body
models like SMPL provide valuable a-priori knowledge which we leverage in order to constrain the dimension of the search space,
enabling more efficient and targeted learning and to define pose-dependency. Instead of directly learning absolute pose-dependent
geometry, we learn the difference between the observed geometry and the fitted SMPL model. This allows us to encode both
pose-dependent appearance and geometry in the consistent UV space of the SMPL model. This approach not only ensures a high
level of realism but also facilitates streamlined processing and rendering of virtual humans in real-time scenarios.

Enabling User Intent-based Network Path
Adaptation for Live Volumetric Streaming

by Peng Qian, Ning Wang, Foh Chuan Heng, Carl Udora, Rahim Tafazolli. IFIP Networking 2024.

Volumetric video is an emerging media application that enables the projection of people or objects into a virtual space in a real-time and immersive manner. Unlike traditional video, live volumetric video streaming in virtual space allows users to interact with teleported objects with a wide range of intentions. However, a significant technical challenge in this context is the need for real-time network adaptation driven by diverse user intents (e.g., user’s movement in the virtual space), which may instantly change the streaming’s network demand. To ensure a satisfactory perceived Quality of Experience (QoE) in the face of both intent and network condition uncertainty, we have developed a novel solution framework that allows offline user intent registration and online user intent capture with necessary path adaptation. This path adaptation module is empowered by a novel Multi-Arm Bandit (MAB) based path selection algorithm, with joint consideration of probed application delay and network congestion. Through real and extensive experiments, we have validated the effectiveness of our proposed framework in assuring user QoE under various network conditions and for different user intent scenarios.

User-Intent Aware Transport-Layer Intelligence for
Frame Synchronisation in Multi-Party XR Application

by Vu San Ha Huynh, Peng Qian, Ning Wang, Carl Udora, Rahim Tafazolli. IEEE Gaming, Entertainment and Media (GEM) Conference.

Emerging immersive media applications demand tailored performance to accommodate diverse user intents, particularly in scenarios with multiple users with different intents and requiring frame synchronisation. This paper introduces a novel transport-layer intelligence scheme that leverages a user intent-aware API. This API enables the application layer to communicate specific user intents and requirements to the transport layer, optimizing immersive application performance. Using deep reinforcement learning, our solution automatically selects the optimal transport protocol and configuration for each user intent across various immersive scenarios. Our evaluation focuses on a live immersive video streaming application, with different users transmitting volumetric content under different network conditions. Results demonstrate that our scheme accurately identifies suitable transport protocols and tailored configurations for a wide range of user intents, ensuring multiuser frame Synchronisation.

Enabling eBPF-Based Packet Duplication for Robust Volumetric Video Streaming

by Peng Qian, Ning Wang, Foh Chuan Heng, Jia Zhang, Carl Udora, Rahim Tafazolli. IEEE Symposium on Computers and Communications (ISCC) 2024.

Volumetric video streaming, an innovative media application, facilitates the real-time and immersive teleportation of individuals or objects into the virtual environment of the audience. Unlike conventional video streaming applications, volumetric content is particularly vulnerable to network fluctuations, which can lead to
performance degradation such as reduced FPS, delayed frame delivery. In this research, we introduce an eBPF-based network function that duplicates packets along the pathways between network nodes, ensuring timely packet delivery amid network instability. Furthermore, we propose a path elimination algorithm to discard paths incapable of delivering frames within the target latency. Our implementation and evaluation validate the rapid and robust performance achieved across various resolution levels.

ComPEQ–MR: Compressed Point Cloud Dataset with Eye Tracking and Quality Assessment in Mixed Reality

by Minh Nguyen, Shivi Vats, Xuemei Zhou, Irene Viola, Pablo Cesar, Christian Timmerer, Hermann Hellwagner. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).

Point clouds (PCs) have attracted researchers and developers due to their ability to provide immersive experiences with six degrees of freedom (6DoF). However, there are still several open issues in understanding the Quality of Experience (QoE) and visual attention of end users while experiencing 6DoF volumetric videos. First, encoding and decoding point clouds require a significant amount of both time and computational resources. Second, QoE prediction models for dynamic point clouds in 6DoF have not yet been developed due to the lack of visual quality databases. Third, visual attention in 6DoF is hardly explored, which impedes research into more sophisticated approaches for adaptive streaming of dynamic point clouds. In this work, we provide an open-source Compressed Point cloud dataset with Eye-tracking and Quality assessment in Mixed Reality (ComPEQ–MR). The dataset comprises four compressed dynamic point clouds processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and GPCC), each with 12 distortion levels. We also conducted subjective tests to assess the quality of the compressed point clouds with different levels of distortion. The rating scores are attached to ComPEQ–MR so that they can be used to develop QoE prediction models in the context of MR environments. Additionally, eye-tracking data for visual saliency is included in this dataset, which is necessary to predict where people look when watching 3D videos in MR experiences. We collected opinion scores and eye-tracking data from 41 participants, resulting in 2132 responses and 164 visual attention maps in total. The dataset is available at https://ftp.itec.aau.at/datasets/ComPEQ-MR/.

Demonstrating Adaptive Many-to-Many Immersive
Teleconferencing for Volumetric Video

by Matthias De Fré, Jeroen van der Hooft, Tim Wauters, and Filip De Turck. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).

In today’s world, the use of video conferencing applications has risen significantly. However, with the introduction of affordable head-mounted displays (HMDs), users are now seeking new immersive and engaging experiences that enhance the 2D video conferencing applications with a third dimension. Immersive video formats such as light fields and volumetric video aim to enhance the experience by allowing for six degrees-of-freedom (6DoF), resulting in users being able to look and walk around in the virtual space. We present a novel, open source, many-to-many streaming architecture using point cloud-based volumetric video. To ensure bitrates that satisfy contemporary networks, the Draco codec encodes the point clouds before they are transmitted using web real-time communication (WebRTC), all while ensuring that the end-to-end latency remains acceptable for real-time communication. A multiple description coding (MDC)-based quality adaptation approach ensures that the pipeline can support a large number of users, each with varying network conditions. In this demo, participants will be seated around a table and will engage in a virtual conference using an HMD, with each participant being captured using a single depth camera. To showcase the quality effectiveness of the MDC-based adaptation algorithm, a dashboard is used to monitor the status of the application and control the bandwidth available to each participant. The available bandwidth and position of the user are taken into account to dynamically assign a quality level to each participant, ensuring a higher quality experience compared to having a uniform quality level for each point cloud object.

Scalable MDC-Based Volumetric Video Delivery for Real-Time
One-to-Many WebRTC Conferencing

by Matthias De Fré, Jeroen van der Hooft, Tim Wauters, and Filip De Turck. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).

The production and consumption of video content has become a staple in the current day and age. With the rise of virtual reality (VR), users are now looking for immersive, interactive experiences which combine the classic video applications, such as conferencing or digital concerts, with newer technologies. By going beyond 2D video into a 360 degree experience the first step was made. However, a 360 degree video offers only rotational movement, making interaction with the environment difficult. Fully immersive 3D content formats, such as light fields and volumetric video, aspire to go further by enabling six degrees-of-freedom (6DoF), allowing both rotational and positional freedom. Nevertheless, the adoption of immersive video capturing and rendering methods has been hindered by their substantial bandwidth and computational requirements, rendering them in most cases impractical for low latency applications. Several efforts have been made to alleviate these problems by introducing specialized compression algorithms and by utilizing existing 2D adaptation methods to adapt the quality based on the user’s available bandwidth. However, even though these methods improve the quality of experience (QoE) and bandwidth limitations, they still suffer from high latency which makes real-time interaction unfeasible. To address this issue, we present a novel, open source, one-to-many streaming architecture using point cloud-based volumetric video. To reduce the bandwidth requirements, we utilize the Draco codec to compress the point clouds before they are transmitted using WebRTC which ensures low latency, enabling the streaming of real-time 6DoF interactive volumetric video. Content is adapted by employing a multiple description coding (MDC) strategy which combines sampled point cloud descriptions based on the estimated bandwidth returned by the Google congestion control (GCC) algorithm. MDC encoding scales more easily to a larger number of users compared to performing individual encoding. Our proposed solution achieves similar real-time latency for both three and nine clients (163 ms and 166 ms), which is 9% and 19% lower compared to individual encoding. The MDC-based approach, using three workers, achieves similar visual quality compared to a per client encoding solution, using five worker threads, and increased quality when the number of clients is greater than 20.

No-Reference Quality of Experience Model for Dynamic Point Clouds in Augmented Reality

by Minh Nguyen, Shivi Vats, Hermann Hellwagner. Proceedings of the IEEE Mile High Video Conference 2024 (MHV 2024).

Point cloud streaming is becoming increasingly popular due to its ability to provide six degrees of freedom (6DOF) for immersive media. Measuring the quality of experience (QoE) is essential to evaluate the performance of point cloud applications. However, most existing QoE models for point cloud streaming are complicated and/or not open source. Therefore, it is desirable to provide an open-source QoE model for point cloud streaming. The International Telecommunication Union (ITU) put in a great deal of effort in video quality estimation models, namely ITU-T P.1203. This P.1203 model was implemented and published on Github1. The model’s inputs include video characteristics (i.e., bitrate, framerate, codec, and frame size), streaming parameters (i.e., stall events), and viewing conditions (i.e., device type and viewing distance). Point cloud streaming also shares some parameters that can be used in the P.1203 model, such as bitrate, framerate, stall events, and viewing distance. However, as the coefficients in the original P.1203 model were determined from a training phase based on a subjective database for 2D videos, they need to be re-trained with a new subjective database for point cloud streaming. In this work, we provide a fine-tuned ITU-T P.1203 model for dynamic point clouds in Augmented Reality (AR) environments. We re-train the P.1203 model with our dataset published in to get the optimal coefficients in this model that achieves the lowest root mean square error (RMSE). The dataset was collected in a subjective test in which the participants watched dynamic point clouds from the 8i lab database with Microsoft’s HoloLens 2 AR glasses. The dynamic point clouds have static qualities or a quality switch in the middle of the sequence. We split this dataset into a training set and a validation set. We train the coefficients of the P.1203 model with the former set and validate its performance with the latter one. The results show that our fine-tuned P.1203 model outperforms the original model from the ITU. Our model achieves an RMSE of 0.813, compared to 0.887 of the original P.1203 model with the training set. The Pearson Linear Correlation Coefficient (PLCC) and Spearman’s Rank Correlation Coefficient (SRCC) of our fine-tuned model are also significantly higher than that of ITU’s model (see Table 1). These values are more than 0.9 in our model, compared to less than 0.786 in the standard P.1203 model for the training dataset. Taken into account the validation dataset, it can be seen that our fine-tuned model provides a better RMSE = 0.955, compared with 1.032 of the standard P.1203 model. We also achieved a better correlation with the ground truth with PLCC = 0.958 while this metric of the standard P.1203 model is 0.918. The correlations of the compared models are visualized in Fig. 1. The fine-tuned P.1203 model is published in https://github.com/minhkstn/itu-p1203-point-clouds.

Quality of Experience Modelling and Analysis for Live Holographic Teleportation

by Carl Udora, Peng Qian, Sweta Anmulwar, Anil Fernando, and Ning Wang. International Conference on Computing, Networking and Communications 2024 (ICNC 2024).

Holographic media offers a more engaging experience than 2D or 3D media, making it a promising technology for future applications. However, producing high-quality holographic media requires meeting demanding requirements such as low latency, high bandwidth, significant computing resources, and intelligent adaptation. Unfortunately, the current network infrastructure falls short of meeting these requirements. The increasing popularity of holographic media and the demand for more immersive experiences make it essential to consider user QoE and factors that influence it. This work focuses on latency sensitive network conditions and examines impactful factors and performance metrics as it relates to the user’s QoE. The impact of disruptive factors is systematically quantified through subjective quality assessment evaluations. Additionally, the work presented proposes a QoE model for evaluating network-based QoE for live holographic teleportation.

Video-Driven Animation of Neural Head Avatars

by Wolfgang Paier, Paul Hinzer, Anna Hilsmann, and Peter Eisert. Vision Modeling and Visualization 2024 (VMV 2024).

We present a new approach for video-driven animation of high-quality neural 3D head models, addressing the challenge of
person-independent animation from video input. Typically, high-quality generative models are learned for specific individuals
from multi-view video footage, resulting in person-specific latent representations that drive the generation process. In order to
achieve person-independent animation from video input, we introduce an LSTM-based animation network capable of translating
person-independent expression features into personalized animation parameters of person-specific 3D head models. Our
approach combines the advantages of personalized head models (high quality and realism) with the convenience of video-driven
animation employing multi-person facial performance capture.We demonstrate the effectiveness of our approach on synthesized
animations with high quality based on different source videos as well as an ablation study.

Characterization of the Quality of Experience and Immersion of Point Cloud Videos in Augmented Reality Through a Subjective Study

by Minh Nguyen, Shivi Vats, Sam Van Damme, Jeroen Van Der Hooft, Maria Torres Vega, Tim Wauters, Filip de Turck, Christian Timmerer, and Hermann Hellwagner. IEEE Access, vol. 11, 2023.

Point cloud streaming has recently attracted research attention as it has the potential to provide six degrees of freedom movement, which is essential for truly immersive media. The transmission of point clouds requires high-bandwidth connections, and adaptive streaming is a promising solution to cope with fluctuating bandwidth conditions. Thus, understanding the impact of different factors in adaptive streaming on the Quality of Experience (QoE) becomes fundamental. Point clouds have been evaluated in Virtual Reality (VR), where viewers are completely immersed in a virtual environment. Augmented Reality (AR) is a novel technology and has recently become popular, yet quality evaluations of point clouds in AR environments are still limited to static images. In this paper, we perform a subjective study of four impact factors on the QoE of point cloud video sequences in AR conditions, including encoding parameters (quantization parameters, QPs), quality switches, viewing distance, and content characteristics.

The experimental results show that these factors significantly impact the QoE. The QoE decreases if the sequence is encoded at high QPs and/or switches to lower quality and/or is viewed at a shorter distance, and vice versa. Additionally, the results indicate that the end user is not able to distinguish the quality differences between two quality levels at a specific (high) viewing distance. An intermediate-quality point cloud encoded at geometry QP (G-QP) 24 and texture QP (T-QP) 32 and viewed at 2.5m can have a QoE (i.e., score 6.5 out of 10) comparable to a high-quality point cloud encoded at 16 and 22 for G-QP and T-QP, respectively, and viewed at a distance of 5 m. Regarding content characteristics, objects with lower contrast can yield better quality scores. Participants’ responses reveal that the visual quality of point clouds has not yet reached an immersion level as desired. The average QoE of the highest visual quality is less than 8 out of 10. There is also a good correlation between objective metrics (e.g., color Peak Signal-to-Noise Ratio (PSNR) and geometry PSNR) and the QoE score. Especially the Pearson correlation coefficients of color PSNR is 0.84. Finally, we found that machine learning models are able to accurately predict the QoE of point clouds in AR environments.

Immersive and Interactive Subjective Quality Assessment
of Dynamic Volumetric Meshes

by Sam Damme, Imen Mahdi, Hemanth Kumar Ravuri, Jeroen van der Hooft, Filip De Turck, and Maria Torres Vega. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).

Dynamic point cloud delivery can provide the required interactivity and realism to six degrees of freedom (6DoF) interactive applications. However, dynamic point cloud rendering imposes stringent requirements (e.g., frames per second (FPS) and quality) that current hardware cannot handle. A possible solution is to convert point cloud into meshes before rendering on the head-mounted display (HMD). However, this conversion can induce degradation in quality perception such as a change in depth, level of detail, or presence of artifacts. This paper, as one of the first, presents an extensive subjective study of the effects of converting point cloud to meshes with different quality representations.

In addition, we provide a novel in-session content rating methodology, providing a more accurate assessment as well as avoiding post-study bias. Our study shows that both compression level and observation distance have their influence on subjective perception. However, the degree of influence is heavily entangled with the content and geometry at hand. Furthermore, we also noticed that while end users are clearly aware of quality switches, the influence on their quality perception is limited. As a result, this has the potential to open up possibilities in bringing the adaptive video streaming paradigm to the 6DoF environment.

Impact of Quality and Distance on the Perception
of Point Clouds in Mixed Reality

by Minh Nguyen, Shivi Vats, Sam Van Damme, Jeroen van der Hooft, Maria Torres Vega, Tim WautersChristian Timmerer, and Hermann Hellwagner. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).

Point Cloud (PC) streaming has recently attracted research attention as it has the potential to provide six degrees of freedom (6DoF), which is essential for truly immersive media.

PCs require high-bandwidth connections, and adaptive streaming is a promising solution to cope with fluctuating bandwidth conditions. Thus, understanding the impact of different factors in adaptive streaming on the Quality of Experience (QoE) becomes fundamental. Mixed Reality (MR) is a novel technology and has recently become popular. However, quality evaluations of PCs in MR environments are still limited to static images. In this paper, we perform a subjective study on four impact factors on the QoE of PC video sequences in MR conditions, including quality switches, viewing distance, and content characteristics.

The experimental results show that these factors significantly impact QoE. The QoE decreases if the sequence switches to lower quality and/or is viewed at a shorter distance, and vice versa. Additionally, the end user might not distinguish the quality differences between two quality levels at a specific viewing distance. Regarding content characteristics, objects with lower contrast seem to provide better quality scores.

A Platform for Subjective Quality Assessment in
Mixed Reality Environments

by Shivi Vats, Minh Nguyen, Sam Van Damme, Jeroen van der Hooft, Maria Torres Vega, Tim Wauters, Christian Timmerer, Hermann Hellwagner. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).

3D objects are important components in Mixed Reality (MR) environments as they allow users to inspect and interact with them in a six degrees of freedom (6DoF) system.

Point clouds (PCs) and meshes are two common 3D object representations that can be compressed to reduce the delivered data at the cost of quality degradation. In addition, as the end users can move around in 6DoF applications, the viewing distance can vary. Quality assessment is necessary to evaluate the impact of the compressed representation and viewing distance on the Quality of Experience (QoE) of end users. This paper presents a demonstrator for subjective quality assessment of dynamic PC and mesh objects under different conditions in MR environments.

Our platform allows conducting subjective tests to evaluate various QoE influence factors, including encoding parameters, quality switching, viewing distance, and content characteristics, with configurable settings for these factors.