Latent Code Evolution for 3D Neural Fields

2019

The Global Latent Era

Global latents proved the concept of neural fields but revealed capacity limits for complex geometry. A single low-dimensional vector conditions a shared MLP to represent entire shapes.

DeepSDF: Learning Continuous SDFs for Shape Representation

1901.05103 (CVPR 2019)

• Introduced auto-decoder paradigm: per-shape latent codes optimized alongside shared MLP decoder for SDF prediction

Auto-Decoder SDF Shape Completion Latent Interpolation Continuous Representation

Occupancy Networks: Learning 3D Reconstruction in Function Space

1812.03828 (CVPR 2019)

• Neural classifier predicting inside/outside shape; handles arbitrary topology without voxel memory explosion

Occupancy Network Implicit Function MISE Decision Boundary End-to-End Reconstruction

IM-Net: Learning Implicit Fields for Generative Shape Modeling

1812.02822 (CVPR 2019)

• Binary classifier on (point + latent) for occupancy; demonstrated high-quality generative modeling of 3D shapes

Implicit Field Binary Classifier Shape Generation Isosurface Extraction Single-View Reconstruction

Scene Representation Networks (SRN)

1906.01618 (CVPR 2019)

• Combined neural rendering with continuous scene representations; global latents for objects and full scenes

Ray-Marching Neural Scene Novel View Synthesis Few-Shot Reconstruction

2020

Regular & Multiscale Local Grids

Researchers realized global latents lack detail. The solution: distribute multiple latents spatially on grids. Moving latents onto regular/multiscale grids solved the capacity problem but introduced memory and rigidity issues.

Convolutional Occupancy Networks (ConvONet)

2003.04618 (ECCV 2020)

• Replaced global latent with regular 3D feature grid; 3D CNN encoder + trilinear interpolation for occupancy decoding

Translation Equivariance Scene Reconstruction U-Net Decoder Structured Reasoning

Local Deep Implicit Functions (LDIF)

1912.06126 (CVPR 2020 Oral)

• Decomposed space into structured local implicit functions, each with own latent; better detail with fewer parameters

Structured Decomposition Gaussian Mixture Depth Completion Shape Auto-Encoding Generalization

IF-Net: Implicit Feature Networks for 3D Shape Reconstruction

2003.01456 (CVPR 2020)

• Multiscale feature grids at multiple resolutions concatenated; captured global structure and fine details simultaneously

Multi-Scale Feature Feature-Space Query Detail Preservation Articulated Reconstruction

Deep Local Shapes (DeepLS)

2003.10983 (ECCV 2020)

• Independent latent codes in a grid encoding local shape neighborhoods; high compression and local completion

Grid Latent Codes Local Shape Completion Compression

2021

Early Hierarchical & Octree Ideas

Hierarchy and sparsity emerged as key themes. Octree and hash-based approaches combined classical multigrid ideas with modern neural implicits, enabling adaptive resolution and dramatic training speedups.

SIREN: Implicit Neural Representations with Periodic Activations

2006.09661 (NeurIPS 2019)

• Periodic activation functions (sine) in MLPs represent complex signals more efficiently; accurate gradient representation

Periodic Activation Sine Activation Gradient Representation

MetaSDF: Meta-Learning Signed Distance Functions

2006.09662 (NeurIPS 2020)

• Meta-learning for implicit representations; learns initialization that quickly adapts to new shapes via gradient descent

Meta-Learning MAML Zero-Level Set Fast Inference Shape Prior

Multiresolution Deep Implicit Functions (MDIF)

2109.05591 (ICCV 2021)

• True hierarchy of latent grids with progressive decoding; latent grid dropout for shape completion

Hierarchical Latent Grids Residual Decoding Latent Dropout Progressive Decoding

Neural Geometric Level of Detail (NGLOD)

2101.10994 (SIGGRAPH 2021)

• Combined octree hierarchical sparse structure with neural fields; adaptive resolution via octree-based latent storage

Sparse Octree Level-of-Detail Real-Time Ray Tracing Neural SDF Feature Volume

Instant NGP: Neural Graphics Primitives with Multiresolution Hash Encoding

2201.05989 (SIGGRAPH 2022)

• Multiresolution hash encoding with trainable hash table; dramatically accelerated neural field training and representation

Hash Encoding Spatial Hash GPU Optimization Parametric Encoding

2022

Irregular Latent Grids

Explicit but adaptive positions + transformer encoding became the dominant approach for detail and efficiency. Farthest Point Sampling placed latents at surface-relevant locations, making representations sparse and transformer-compatible.

3DILG: Irregular Latent Grids for 3D Generative Modeling

2205.13914 (NeurIPS 2022)

• Pioneered irregular sparse latent grids: FPS places M latents at adaptive positions; kernel regression interpolates

Irregular Latent Grid Auto-Regressive Transformer Vector Quantization Farthest Point Sampling Sparse Adaptive Representation

Point-Voxel Diffusion (PVD)

2104.03670 (ICCV 2021)

• Pioneered diffusion models for 3D point clouds; point-voxel CNN backbone for unconditional and conditional generation

Denoising Diffusion Point-Voxel CNN Shape Completion Probabilistic Generation

LION: Latent Point Diffusion Models for 3D Shape Generation

2210.06978 (NeurIPS 2022)

• VQ-VAE + diffusion in compressed latent space; higher quality and diversity than direct 3D space generation

Latent Diffusion Hierarchical VAE Point Cloud Diffusion Denoising Diffusion Surface Reconstruction

2023

Position-Free & Feature-Plane Latents

The field shifted from "where the latent lives" to "what abstract feature the latent encodes." Position-free sets and axis-aligned feature planes emerged as powerful representations, enabling transformer-native diffusion on structured 2D feature maps.

3DShape2VecSet: A Representation for Neural Fields and Generative Diffusion

2301.11445 (TOG / SIGGRAPH 2023)

• Removed explicit 3D positions entirely; shape as fixed set of vectors with learned cross-attention for query interpolation

VecSet Cross-Attention Generative Diffusion KL Regularization Shape Auto-Encoding

3D Neural Field Generation Using Triplane Diffusion

2211.16677 (CVPR 2023)

• Popularized triplane representations (three orthogonal feature planes) for efficient 3D neural field generation with latent diffusion

Triplane Neural Field Diffusion 2D Diffusion Backbone Explicit Density Regularization

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion

2304.09787 (CVPR 2023)

• Extended LDM paradigm to 3D scene generation; scene autoencoder to latent space to hierarchical diffusion

Scene Autoencoder Hierarchical Diffusion Voxel Grid

2023–2025

Discrete & Tokenized Representations

Discretization fundamentally changed latent code nature. VQ-VAE compression and autoregressive tokenization enabled transformer-based generation on discrete codebooks, bridging 3D shape understanding with language model architectures.

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

2212.04493 (CVPR 2023)

• Multimodal generation via VQ-VAE + latent diffusion; supports partial shapes, images, text with adjustable conditioning weights

SDF Latent Diffusion Multi-Modal Conditioning VQ-VAE Shape Texturing

Shap-E: Generating Conditional 3D Implicit Functions

2305.02463 (arXiv / OpenAI 2023)

• Directly generates implicit function parameters (SDF/NeRF) via conditional diffusion; textured meshes and neural radiance fields

Implicit Neural Representation Conditional Diffusion NeRF Two-Stage Training Encoder-Decoder

Diffusion-3D: Large-Vocabulary 3D Diffusion Model with Transformer

2309.07920 (ICCV 2023)

• Large-vocabulary 3D diffusion via discrete tokenization; demonstrates transformer scalability for 3D generation

Large Vocabulary Discrete Tokens Transformer

OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation

2504.09975 (2025)

• Serializes octree hierarchies into multiscale autoregressive model; SOTA on unconditional and conditional 3D generation

Octree Autoregressive Multiscale Tokenization RoPE3D VQ-VAE Parallel Token Generation

2023–2024

Explicit Primitives: Gaussians

Gaussian primitives emerged as an alternative to implicit neural fields. Explicit point-based representations with learnable covariance enabled real-time rendering and faster optimization, bridging classical graphics with neural generation.

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

2309.16653 (ICCV 2024)

• Combines Gaussian splatting with diffusion priors via SDS for efficient text-to-3D; mesh extraction for downstream use

Gaussian Splatting SDS UV Texture Refinement Mesh Extraction

GaussianDreamer: Fast Text-to-3D by Bridging 2D and 3D Diffusion

2310.08529 (arXiv 2023)

• Hybrid 2D+3D diffusion; point cloud diffusion prior initializes Gaussians, then SDS refines

Gaussian Splatting 2D-3D Bridging SDS Noise Point Growth Real-Time Rendering

LGM: Large Multi-View Gaussian Model for 3D Generation

2402.05054 (CVPR 2024)

• Gaussian-based 3D generation from single image; multi-view images to 3D Gaussians for faster rendering

Multi-View Gaussian Asymmetric UNet Gaussian Splatting Mesh Extraction Feed-Forward 3D

GSN: Generalisable Segmentation in Neural Radiance Field

2402.04632 (arXiv 2024)

• Neural fields based on Gaussian distributions; models radiance as Gaussians for flexible hybrid implicit-explicit representations

Generalizable NeRF Semantic Feature Distillation DINO Features Neural Feature Field Cross-Scene Generalization

2024–2026

Adaptive Hierarchical Structures

The frontier returned to hierarchical representations with learned adaptivity. Geometry-aware local grids and unsupervised hierarchical transformers enable adaptive resolution that follows surface complexity, closing the loop from global codes back to structured hierarchies.

GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation

2410.10037 (ICLR 2025)

• Geometry-aware local adaptive grids exploiting surface sparsity and local geometric properties; cascaded generation for detail

Octree Forest Geometry-Adaptive Grid Cascaded Diffusion SDF Local Adaptive Encoding

HiT: Hierarchical Transformers for Unsupervised 3D Shape Abstraction

2510.27088 (3DV 2026)

• Unsupervised hierarchical abstractions using compressed codebooks across tree levels

Hierarchical Transformer Unsupervised Abstraction Convex Primitive Variable-Branch Tree Cross-Attention Codebook

Latent Code Structures for 3D Neural Fields