Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction
(CVPR 2026)

Noé Artru^1,2,3,*, Rukhshanda Hussain^1,3, Emeline Got¹, Alexandre Messier¹, David Lindell², Abdallah Dib¹

¹Ubisoft La Forge, ²University of Toronto, ³ÉTS Montréal

Paper (Coming Soon) arXiv Code (Coming Soon)

Input images

Predicted normals

Reconstructed mesh

Input images

Predicted normals

Reconstructed mesh

TL;DR: Skullptor reconstructs high-fidelity 3D heads in seconds by predicting view-consistent normals from a sparse set of input images and camera poses.

Overview

Skullptor takes multi-view input images captured from a light-stage and encodes information from all cameras to predict view-consistent surface normals. For mesh reconstruction, given a spherical mesh initialization and known camera parameters, it iteratively refines the mesh to produce a high-resolution, detailed 3D surface.

Method

Multi-view normal prediction - A cross-view attention mechanism aggregates information across sparse input images to predict surface normals that are geometrically consistent across all views.

Mesh optimization - The predicted surface normals serve as geometric priors in an inverse-rendering framework, which iteratively refines a spherical mesh to allign with the input normals.

Results

Input Image (1 of 10)

Predicted Normal Map

Dynamic Render of Reconstructed Meshes

Our approach generates faithful facial reconstructions with high-frequency geometric details. The above results are achieved using a sparse setup of only 10 input camera views.

Comparisons

Normal Prediction Comparison

Comparison of predicted normals with baselines

Compared to current state-of-the-art methods, Skullptor provides more faithful normal maps, significantly improving the reconstruction of fine-scale facial features. This results in higher identity preservation and the successful capture of high-frequency surface details like fine wrinkles.

Mesh Reconstruction Comparison

Comparison of final mesh reconstruction with baselines

Qualitative comparison of Skullptor with implicit reconstruction methods. Unlike 2DGS and SuGaR, which exhibit surface noise and loss of fine-scale detail, Skullptor maintains the geometric precision of traditional photogrammetry with significantly lower computational overhead (10× faster).

Left: Ours | Right: Photogrammetry

We analyze the impact of viewpoint sparcity on reconstruction quality. Compared to a traditional photogrammetry baseline (Meshroom), Skullptor showcases much better robustness to low input view counts (eg., 3-6 views).

BibTeX


  @InProceedings{artru2026skullptor,
      author    = {Artru, Noé and Hussain, Rukhshanda and Got, Emeline and Messier, Alexandre and Lindell, David and Dib, Abdallah},
      title     = {Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
      year      = {2026}
  }

Skullptor: High Fidelity 3D Head Reconstruction in Seconds with Multi-View Normal Prediction (CVPR 2026)