MoSAR: Monocular Semi-Supervised Model For Avatar Reconstruction Using Differentiable Shading
(CVPR 2024)

1Ubisoft LaForge, 2York University, 3Ecole de Technologie Supérieure
*Equal contribution

MoSAR turns a portrait image into a relightable 3D avatar. It estimates detailed geometry, rich reflectance maps (diffuse, specular, normals, ambient occlusion, translucency) at 4K resolution.

Abstract

Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods.

Video (with audio)

MoSAR captures pore-level details

FFHQ-UV-Intrinsics dataset

We release a new dataset, named FFHQ-UV-Intrinsics, that contains intrinsics texture maps for 10K subjects from the publicly available dataset FFHQ-UV. The FFHQ-UV dataset is composed of texture maps of 1K resolution, for subjects sampled from the latent space of StyleGAN. These texture contains evenly illuminated face images. However, light, geometry and skin reflectance information are entangled in the same texture making them less suitable for relighting.
To obtain the intrinsic face attributes, we first re-targeted the texture maps to our own topology and resize them to 512x512. Next, we apply the proposed light normalization and Intrinsic texture maps estimation steps. We then upscale these texture maps to 1K resolution and retarget them back to their original topology.
The resulting dataset, FFHQ-UV-Intrinsics, is being publicly released for the research community under Creative Commons Attribution-NonCommercial-NoDerivatives license. The dataset contains diffuse, specular, ambient occlusion, translucency and normal maps for 10K subjects. This is the first dataset that offer rich intrinsic face attributes at high resolution and at large scale, with the aim of advancing research in this field.

Dataset samples
A sample from the proposed FFHQ-UV-Intrinsics dataset.

BibTeX

@InProceedings{Dib_2024_CVPR,
    author    = {Dib, Abdallah and Hafemann, Luiz Gustavo and Got, Emeline and Anderson, Trevor and Fadaeinejad, Amin and Cruz, Rafael M. O. and Carbonneau, Marc-Andr\'e},
    title     = {MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {1770-1780}
}