Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction
(CVPR 2026)

1Ubisoft La Forge, 2University of Toronto, 3ÉTS Montréal

TL;DR: Skullptor reconstructs high-fidelity 3D heads in seconds by predicting view-consistent normals from a sparse set of input images and camera poses.

Overview

Skullptor takes multi-view input images captured from a light-stage and encodes information from all cameras to predict view-consistent surface normals. For mesh reconstruction, given a spherical mesh initialization and known camera parameters, it iteratively refines the mesh to produce a high-resolution, detailed 3D surface.

Method

Skullptor method figure

  1. Multi-view normal prediction - A cross-view attention mechanism aggregates information across sparse input images to predict surface normals that are geometrically consistent across all views.
  2. Mesh optimization - The predicted surface normals serve as geometric priors in an inverse-rendering framework, which iteratively refines a spherical mesh to allign with the input normals.

Results

Input Image (1 of 10)

Input image 1 Input image 2 Input image 3 Input image 4

Predicted Normal Map

Normal map 1 Normal map 2 Normal map 3 Normal map 4

Dynamic Render of Reconstructed Meshes

Our approach generates faithful facial reconstructions with high-frequency geometric details. The above results are achieved using a sparse setup of only 10 input camera views.

Comparisons

Normal Prediction Comparison

Comparison of predicted normals with baselines

Compared to current state-of-the-art methods, Skullptor provides more faithful normal maps, significantly improving the reconstruction of fine-scale facial features. This results in higher identity preservation and the successful capture of high-frequency surface details like fine wrinkles.

Mesh Reconstruction Comparison

Comparison of final mesh reconstruction with baselines

Qualitative comparison of Skullptor with implicit reconstruction methods. Unlike 2DGS and SuGaR, which exhibit surface noise and loss of fine-scale detail, Skullptor maintains the geometric precision of traditional photogrammetry with significantly lower computational overhead (10× faster).

We analyze the impact of viewpoint sparcity on reconstruction quality. Compared to a traditional photogrammetry baseline (Meshroom), Skullptor showcases much better robustness to low input view counts (eg., 3-6 views).

BibTeX


  @InProceedings{artru2026skullptor,
      author    = {Artru, Noé and Hussain, Rukhshanda and Got, Emeline and Messier, Alexandre and Lindell, David and Dib, Abdallah},
      title     = {Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year      = {2026}
  }