Human pose estimation solutions: a low cost tool for increasing natural interaction in virtual television sets

Loading...
Thumbnail Image
Identifiers

Publication date

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Springer
Metrics
Google Scholar
lacobus
Export

Research Projects

Organizational Units

Journal Issue

Abstract

Virtual television sets (VTS) have experienced a growth spurt in recent years, paralleling the development of virtual reality tools and the metaverse. Nowadays, this technology is used in multiple broadcasts from major television companies to smaller tv stations ranging from sports programs to election nights, news, etc. Despite significant advances in recent years, such as improvements in real and virtual content composition or real-time realistic rendering, interaction between presenters and virtual content remains a challenge. This is primarily due to the high cost and complexity of the equipment required, as well as the limitations imposed by live production technology. In this context, we propose testing Human Pose Estimation (HPE) tools over the studio camera stream as a potential solution that does not require additional hardware integration into the system. We evaluated 14 HPE solutions using a three-step process. First, we assessed the robustness and viability of reliable real-time execution for each solution, with five solutions passing this initial phase. Secondly, we analyzed frames per second (FPS), RAM consumption, and CPU usage for each alternative in both local and global scenarios, considering both the ‘printmetrics’ and ‘no view’ options. BlazePose OpenVINO demonstrated the best performance in these tests and was selected for further testing in real-world scenarios. These tests have confirmed that HPE is a viable alternative for enhancing human-computer interaction in VTS. However, certain limitations remain, such as the lack of reliable depth data and the need for further analysis in detecting complex dynamic gestures. The proposed software-based VTSs promotes universal accessibility by eliminating the need for external control devices, reducing economic barriers and allowing users to customise natural, adaptive interactions that fit their individual capabilities and contextual needs.

Description

Bibliographic citation

Arenas, R., Méndez, R., Pedraza, L. et al. Human pose estimation solutions: a low cost tool for increasing natural interaction in virtual television sets. Univ Access Inf Soc (2025). https://doi.org/10.1007/s10209-025-01255-x

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

Rights

© The Author(s) 2025. Attribution 4.0 International