We present Mask2Hand, a self-trainable method for predicting 3D hand pose and shape from a single 2D binary silhouette. Without additional manual annotations, our method uses differentiable rendering to project 3D estimations onto the 2D silhouette. A tailored loss function, applied between the rendered and input silhouettes, provides a self-guidance mechanism during end-to-end optimization, which constrains global mesh registration and hand pose estimation. Our experiments show that Mask2Hand, using only a binary mask input, achieves accuracy comparable to state-ofthe- art methods requiring RGB or depth inputs on both unaligned and aligned datasets.
Companion
APSIPA Transactions on Signal and Information Processing Special Issue - Invited Papers from APSIPA ASC 2023
See the other articles that are part of this special issue.