Discover all SPIRIT-related publications here or visit the SPIRIT community.
by Peng Qian, Ning Wang, Foh Chuan Heng, Carl Udora, Rahim Tafazolli. IFIP Networking 2024.
Volumetric video is an emerging media application that enables the projection of people or objects into a virtual space in a real-time and immersive manner. Unlike traditional video, live volumetric video streaming in virtual space allows users to interact with teleported objects with a wide range of intentions. However, a significant technical challenge in this context is the need for real-time network adaptation driven by diverse user intents (e.g., user’s movement in the virtual space), which may instantly change the streaming’s network demand. To ensure a satisfactory perceived Quality of Experience (QoE) in the face of both intent and network condition uncertainty, we have developed a novel solution framework that allows offline user intent registration and online user intent capture with necessary path adaptation. This path adaptation module is empowered by a novel Multi-Arm Bandit (MAB) based path selection algorithm, with joint consideration of probed application delay and network congestion. Through real and extensive experiments, we have validated the effectiveness of our proposed framework in assuring user QoE under various network conditions and for different user intent scenarios.
by Vu San Ha Huynh, Peng Qian, Ning Wang, Carl Udora, Rahim Tafazolli. IEEE Gaming, Entertainment and Media (GEM) Conference.
Emerging immersive media applications demand tailored performance to accommodate diverse user intents, particularly in scenarios with multiple users with different intents and requiring frame synchronisation. This paper introduces a novel transport-layer intelligence scheme that leverages a user intent-aware API. This API enables the application layer to communicate specific user intents and requirements to the transport layer, optimizing immersive application performance. Using deep reinforcement learning, our solution automatically selects the optimal transport protocol and configuration for each user intent across various immersive scenarios. Our evaluation focuses on a live immersive video streaming application, with different users transmitting volumetric content under different network conditions. Results demonstrate that our scheme accurately identifies suitable transport protocols and tailored configurations for a wide range of user intents, ensuring multiuser frame Synchronisation.
by Peng Qian, Ning Wang, Foh Chuan Heng, Jia Zhang, Carl Udora, Rahim Tafazolli. IEEE Symposium on Computers and Communications (ISCC) 2024.
Volumetric video streaming, an innovative media application, facilitates the real-time and immersive teleportation of individuals or objects into the virtual environment of the audience. Unlike conventional video streaming applications, volumetric content is particularly vulnerable to network fluctuations, which can lead to
performance degradation such as reduced FPS, delayed frame delivery. In this research, we introduce an eBPF-based network function that duplicates packets along the pathways between network nodes, ensuring timely packet delivery amid network instability. Furthermore, we propose a path elimination algorithm to discard paths incapable of delivering frames within the target latency. Our implementation and evaluation validate the rapid and robust performance achieved across various resolution levels.
by Wieland Morgenstern, Milena T. Bagdasarian, Anna Hilsmann, Peter Eisert. IEEE Transactions on Visualization and Computer Graphics.
We propose a novel representation of virtual humans for highly realistic real-time animation and rendering in 3D applications.
We learn pose dependent appearance and geometry from highly accurate dynamic mesh sequences obtained from state-of-theart multiview-video reconstruction. Learning pose-dependent appearance and geometry from mesh sequences poses significant
challenges, as it requires the network to learn the intricate shape and articulated motion of a human body. However, statistical body
models like SMPL provide valuable a-priori knowledge which we leverage in order to constrain the dimension of the search space,
enabling more efficient and targeted learning and to define pose-dependency. Instead of directly learning absolute pose-dependent
geometry, we learn the difference between the observed geometry and the fitted SMPL model. This allows us to encode both
pose-dependent appearance and geometry in the consistent UV space of the SMPL model. This approach not only ensures a high
level of realism but also facilitates streamlined processing and rendering of virtual humans in real-time scenarios.
by Minh Nguyen, Shivi Vats, Xuemei Zhou, Irene Viola, Pablo Cesar, Christian Timmerer, Hermann Hellwagner. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).
Point clouds (PCs) have attracted researchers and developers due to their ability to provide immersive experiences with six degrees of freedom (6DoF). However, there are still several open issues in understanding the Quality of Experience (QoE) and visual attention of end users while experiencing 6DoF volumetric videos. First, encoding and decoding point clouds require a significant amount of both time and computational resources. Second, QoE prediction models for dynamic point clouds in 6DoF have not yet been developed due to the lack of visual quality databases. Third, visual attention in 6DoF is hardly explored, which impedes research into more sophisticated approaches for adaptive streaming of dynamic point clouds. In this work, we provide an open-source Compressed Point cloud dataset with Eye-tracking and Quality assessment in Mixed Reality (ComPEQ–MR). The dataset comprises four compressed dynamic point clouds processed by Moving Picture Experts Group (MPEG) reference tools (i.e., VPCC and GPCC), each with 12 distortion levels. We also conducted subjective tests to assess the quality of the compressed point clouds with different levels of distortion. The rating scores are attached to ComPEQ–MR so that they can be used to develop QoE prediction models in the context of MR environments. Additionally, eye-tracking data for visual saliency is included in this dataset, which is necessary to predict where people look when watching 3D videos in MR experiences. We collected opinion scores and eye-tracking data from 41 participants, resulting in 2132 responses and 164 visual attention maps in total. The dataset is available at https://ftp.itec.aau.at/datasets/ComPEQ-MR/.
by Matthias De Fré, Jeroen van der Hooft, Tim Wauters, and Filip De Turck. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).
In today’s world, the use of video conferencing applications has risen significantly. However, with the introduction of affordable head-mounted displays (HMDs), users are now seeking new immersive and engaging experiences that enhance the 2D video conferencing applications with a third dimension. Immersive video formats such as light fields and volumetric video aim to enhance the experience by allowing for six degrees-of-freedom (6DoF), resulting in users being able to look and walk around in the virtual space. We present a novel, open source, many-to-many streaming architecture using point cloud-based volumetric video. To ensure bitrates that satisfy contemporary networks, the Draco codec encodes the point clouds before they are transmitted using web real-time communication (WebRTC), all while ensuring that the end-to-end latency remains acceptable for real-time communication. A multiple description coding (MDC)-based quality adaptation approach ensures that the pipeline can support a large number of users, each with varying network conditions. In this demo, participants will be seated around a table and will engage in a virtual conference using an HMD, with each participant being captured using a single depth camera. To showcase the quality effectiveness of the MDC-based adaptation algorithm, a dashboard is used to monitor the status of the application and control the bandwidth available to each participant. The available bandwidth and position of the user are taken into account to dynamically assign a quality level to each participant, ensuring a higher quality experience compared to having a uniform quality level for each point cloud object.
by Matthias De Fré, Jeroen van der Hooft, Tim Wauters, and Filip De Turck. 15th ACM Multimedia Systems Conference (ACM MMSys 2024).
The production and consumption of video content has become a staple in the current day and age. With the rise of virtual reality (VR), users are now looking for immersive, interactive experiences which combine the classic video applications, such as conferencing or digital concerts, with newer technologies. By going beyond 2D video into a 360 degree experience the first step was made. However, a 360 degree video offers only rotational movement, making interaction with the environment difficult. Fully immersive 3D content formats, such as light fields and volumetric video, aspire to go further by enabling six degrees-of-freedom (6DoF), allowing both rotational and positional freedom. Nevertheless, the adoption of immersive video capturing and rendering methods has been hindered by their substantial bandwidth and computational requirements, rendering them in most cases impractical for low latency applications. Several efforts have been made to alleviate these problems by introducing specialized compression algorithms and by utilizing existing 2D adaptation methods to adapt the quality based on the user’s available bandwidth. However, even though these methods improve the quality of experience (QoE) and bandwidth limitations, they still suffer from high latency which makes real-time interaction unfeasible. To address this issue, we present a novel, open source, one-to-many streaming architecture using point cloud-based volumetric video. To reduce the bandwidth requirements, we utilize the Draco codec to compress the point clouds before they are transmitted using WebRTC which ensures low latency, enabling the streaming of real-time 6DoF interactive volumetric video. Content is adapted by employing a multiple description coding (MDC) strategy which combines sampled point cloud descriptions based on the estimated bandwidth returned by the Google congestion control (GCC) algorithm. MDC encoding scales more easily to a larger number of users compared to performing individual encoding. Our proposed solution achieves similar real-time latency for both three and nine clients (163 ms and 166 ms), which is 9% and 19% lower compared to individual encoding. The MDC-based approach, using three workers, achieves similar visual quality compared to a per client encoding solution, using five worker threads, and increased quality when the number of clients is greater than 20.
by Wolfgang Paier, Paul Hinzer, Anna Hilsmann, and Peter Eisert. Vision Modeling and Visualization 2024 (VMV 2024).
We present a new approach for video-driven animation of high-quality neural 3D head models, addressing the challenge of
person-independent animation from video input. Typically, high-quality generative models are learned for specific individuals
from multi-view video footage, resulting in person-specific latent representations that drive the generation process. In order to
achieve person-independent animation from video input, we introduce an LSTM-based animation network capable of translating
person-independent expression features into personalized animation parameters of person-specific 3D head models. Our
approach combines the advantages of personalized head models (high quality and realism) with the convenience of video-driven
animation employing multi-person facial performance capture.We demonstrate the effectiveness of our approach on synthesized
animations with high quality based on different source videos as well as an ablation study.
by Minh Nguyen, Shivi Vats, Hermann Hellwagner. Proceedings of the IEEE Mile High Video Conference 2024 (MHV 2024).
Point cloud streaming is becoming increasingly popular due to its ability to provide six degrees of freedom (6DOF) for immersive media. Measuring the quality of experience (QoE) is essential to evaluate the performance of point cloud applications. However, most existing QoE models for point cloud streaming are complicated and/or not open source. Therefore, it is desirable to provide an open-source QoE model for point cloud streaming. The International Telecommunication Union (ITU) put in a great deal of effort in video quality estimation models, namely ITU-T P.1203. This P.1203 model was implemented and published on Github1. The model’s inputs include video characteristics (i.e., bitrate, framerate, codec, and frame size), streaming parameters (i.e., stall events), and viewing conditions (i.e., device type and viewing distance). Point cloud streaming also shares some parameters that can be used in the P.1203 model, such as bitrate, framerate, stall events, and viewing distance. However, as the coefficients in the original P.1203 model were determined from a training phase based on a subjective database for 2D videos, they need to be re-trained with a new subjective database for point cloud streaming. In this work, we provide a fine-tuned ITU-T P.1203 model for dynamic point clouds in Augmented Reality (AR) environments. We re-train the P.1203 model with our dataset published in to get the optimal coefficients in this model that achieves the lowest root mean square error (RMSE). The dataset was collected in a subjective test in which the participants watched dynamic point clouds from the 8i lab database with Microsoft’s HoloLens 2 AR glasses. The dynamic point clouds have static qualities or a quality switch in the middle of the sequence. We split this dataset into a training set and a validation set. We train the coefficients of the P.1203 model with the former set and validate its performance with the latter one. The results show that our fine-tuned P.1203 model outperforms the original model from the ITU. Our model achieves an RMSE of 0.813, compared to 0.887 of the original P.1203 model with the training set. The Pearson Linear Correlation Coefficient (PLCC) and Spearman’s Rank Correlation Coefficient (SRCC) of our fine-tuned model are also significantly higher than that of ITU’s model (see Table 1). These values are more than 0.9 in our model, compared to less than 0.786 in the standard P.1203 model for the training dataset. Taken into account the validation dataset, it can be seen that our fine-tuned model provides a better RMSE = 0.955, compared with 1.032 of the standard P.1203 model. We also achieved a better correlation with the ground truth with PLCC = 0.958 while this metric of the standard P.1203 model is 0.918. The correlations of the compared models are visualized in Fig. 1. The fine-tuned P.1203 model is published in https://github.com/minhkstn/itu-p1203-point-clouds.
by Carl Udora, Peng Qian, Sweta Anmulwar, Anil Fernando, and Ning Wang. International Conference on Computing, Networking and Communications 2024 (ICNC 2024).
Holographic media offers a more engaging experience than 2D or 3D media, making it a promising technology for future applications. However, producing high-quality holographic media requires meeting demanding requirements such as low latency, high bandwidth, significant computing resources, and intelligent adaptation. Unfortunately, the current network infrastructure falls short of meeting these requirements. The increasing popularity of holographic media and the demand for more immersive experiences make it essential to consider user QoE and factors that influence it. This work focuses on latency sensitive network conditions and examines impactful factors and performance metrics as it relates to the user’s QoE. The impact of disruptive factors is systematically quantified through subjective quality assessment evaluations. Additionally, the work presented proposes a QoE model for evaluating network-based QoE for live holographic teleportation.
by Minh Nguyen, Shivi Vats, Sam Van Damme, Jeroen Van Der Hooft, Maria Torres Vega, Tim Wauters, Filip de Turck, Christian Timmerer, and Hermann Hellwagner. IEEE Access, vol. 11, 2023.
Point cloud streaming has recently attracted research attention as it has the potential to provide six degrees of freedom movement, which is essential for truly immersive media. The transmission of point clouds requires high-bandwidth connections, and adaptive streaming is a promising solution to cope with fluctuating bandwidth conditions. Thus, understanding the impact of different factors in adaptive streaming on the Quality of Experience (QoE) becomes fundamental. Point clouds have been evaluated in Virtual Reality (VR), where viewers are completely immersed in a virtual environment. Augmented Reality (AR) is a novel technology and has recently become popular, yet quality evaluations of point clouds in AR environments are still limited to static images. In this paper, we perform a subjective study of four impact factors on the QoE of point cloud video sequences in AR conditions, including encoding parameters (quantization parameters, QPs), quality switches, viewing distance, and content characteristics.
The experimental results show that these factors significantly impact the QoE. The QoE decreases if the sequence is encoded at high QPs and/or switches to lower quality and/or is viewed at a shorter distance, and vice versa. Additionally, the results indicate that the end user is not able to distinguish the quality differences between two quality levels at a specific (high) viewing distance. An intermediate-quality point cloud encoded at geometry QP (G-QP) 24 and texture QP (T-QP) 32 and viewed at 2.5m can have a QoE (i.e., score 6.5 out of 10) comparable to a high-quality point cloud encoded at 16 and 22 for G-QP and T-QP, respectively, and viewed at a distance of 5 m. Regarding content characteristics, objects with lower contrast can yield better quality scores. Participants’ responses reveal that the visual quality of point clouds has not yet reached an immersion level as desired. The average QoE of the highest visual quality is less than 8 out of 10. There is also a good correlation between objective metrics (e.g., color Peak Signal-to-Noise Ratio (PSNR) and geometry PSNR) and the QoE score. Especially the Pearson correlation coefficients of color PSNR is 0.84. Finally, we found that machine learning models are able to accurately predict the QoE of point clouds in AR environments.
by Sam Damme, Imen Mahdi, Hemanth Kumar Ravuri, Jeroen van der Hooft, Filip De Turck, and Maria Torres Vega. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).
Dynamic point cloud delivery can provide the required interactivity and realism to six degrees of freedom (6DoF) interactive applications. However, dynamic point cloud rendering imposes stringent requirements (e.g., frames per second (FPS) and quality) that current hardware cannot handle. A possible solution is to convert point cloud into meshes before rendering on the head-mounted display (HMD). However, this conversion can induce degradation in quality perception such as a change in depth, level of detail, or presence of artifacts. This paper, as one of the first, presents an extensive subjective study of the effects of converting point cloud to meshes with different quality representations.
In addition, we provide a novel in-session content rating methodology, providing a more accurate assessment as well as avoiding post-study bias. Our study shows that both compression level and observation distance have their influence on subjective perception. However, the degree of influence is heavily entangled with the content and geometry at hand. Furthermore, we also noticed that while end users are clearly aware of quality switches, the influence on their quality perception is limited. As a result, this has the potential to open up possibilities in bringing the adaptive video streaming paradigm to the 6DoF environment.
by Minh Nguyen, Shivi Vats, Sam Van Damme, Jeroen van der Hooft, Maria Torres Vega, Tim WautersChristian Timmerer, and Hermann Hellwagner. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).
Point Cloud (PC) streaming has recently attracted research attention as it has the potential to provide six degrees of freedom (6DoF), which is essential for truly immersive media.
PCs require high-bandwidth connections, and adaptive streaming is a promising solution to cope with fluctuating bandwidth conditions. Thus, understanding the impact of different factors in adaptive streaming on the Quality of Experience (QoE) becomes fundamental. Mixed Reality (MR) is a novel technology and has recently become popular. However, quality evaluations of PCs in MR environments are still limited to static images. In this paper, we perform a subjective study on four impact factors on the QoE of PC video sequences in MR conditions, including quality switches, viewing distance, and content characteristics.
The experimental results show that these factors significantly impact QoE. The QoE decreases if the sequence switches to lower quality and/or is viewed at a shorter distance, and vice versa. Additionally, the end user might not distinguish the quality differences between two quality levels at a specific viewing distance. Regarding content characteristics, objects with lower contrast seem to provide better quality scores.
by Shivi Vats, Minh Nguyen, Sam Van Damme, Jeroen van der Hooft, Maria Torres Vega, Tim Wauters, Christian Timmerer, Hermann Hellwagner. Proceedings of 15th International Conference on Quality of Multimedia Experience (QoMEX 2023).
3D objects are important components in Mixed Reality (MR) environments as they allow users to inspect and interact with them in a six degrees of freedom (6DoF) system.
Point clouds (PCs) and meshes are two common 3D object representations that can be compressed to reduce the delivered data at the cost of quality degradation. In addition, as the end users can move around in 6DoF applications, the viewing distance can vary. Quality assessment is necessary to evaluate the impact of the compressed representation and viewing distance on the Quality of Experience (QoE) of end users. This paper presents a demonstrator for subjective quality assessment of dynamic PC and mesh objects under different conditions in MR environments.
Our platform allows conducting subjective tests to evaluate various QoE influence factors, including encoding parameters, quality switching, viewing distance, and content characteristics, with configurable settings for these factors.