Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection

Yinghan Wang; Jianmin Dong; Yanan Wang; Bingyang Sun

doi:10.1051/e3sconf/202452201043

All issues

Volume 522 (2024)

E3S Web Conf., 522 (2024) 01043

Abstract

Open Access

Issue		E3S Web Conf. Volume 522, 2024 2023 9^th International Symposium on Vehicle Emission Supervision and Environment Protection (VESEP2023)


Article Number		01043
Number of page(s)		13
DOI		https://doi.org/10.1051/e3sconf/202452201043
Published online		07 May 2024

E3S Web of Conferences 522, 01043 (2024)

Multi-view 3D human pose estimation based on multi-scale feature by orthogonal projection

Yinghan Wang, Jianmin Dong^*, Yanan Wang and Bingyang Sun

College of Information Engineering, Xizang Minzu University, 712082 Xianyang Shaanxi, China

^* Corresponding author: jmdong@xzmu.edu.cn

Abstract

Aiming at the problems of inaccurate estimation results, complicated matching of feature information in different views and poor robustness of the network model in complex scenes, a multi-view multi-person 3D human pose estimation model with multi-scale feature orthogonal projection is proposed, which includes a multi-scale orthogonal projection fusion network and an orthogonal feature ascending dimension network. Firstly, the multi-scale orthogonal projection fusion network performs orthogonal projection of features at multiple scales, using the residual structure to fuse features in the same plane separately, simplifying the feature learning difficulty and reducing the feature loss due to projection. Then, it is fed into the orthogonal feature ascending dimension network to reconstruct higher level 3D features using trilinear interpolation and deconvolution to improve the expressiveness of the model, and finally fed to the backbone network to supplement the information of the high-dimensional features, and the network regresses according to the different stages of the task to obtain the 3D human pose. The experimental results show that the Percentage of 3D Correct Parts is improved on the Campus and Shelf datasets, and the Mean Per Joint Position Error is reduced on the CMU Panoptic dataset and the average accuracy is improved at a smaller threshold compared to the previous method. The prediction results are also better than the previous method by reducing the perspective input on the trained model. The proposed method not only effectively estimates the 3D human pose, but also improves the prediction accuracy and enhances the robustness of the network model.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.