Latent Code Structures for 3D Neural Fields

Historical Review — 2019 to 2026

Key Themes
Neural Fields Latent Diffusion Score Distillation Sampling Implicit Representations Hierarchical Representations Gaussian Splatting
2019

The Global Latent Era

Global latents proved the concept of neural fields but revealed capacity limits for complex geometry. A single low-dimensional vector conditions a shared MLP to represent entire shapes.

DeepSDF: Learning Continuous SDFs for Shape Representation
1901.05103 (CVPR 2019)
• Introduced auto-decoder paradigm: per-shape latent codes optimized alongside shared MLP decoder for SDF prediction
Auto-Decoder SDF Shape Completion Latent Interpolation Continuous Representation
Occupancy Networks: Learning 3D Reconstruction in Function Space
1812.03828 (CVPR 2019)
• Neural classifier predicting inside/outside shape; handles arbitrary topology without voxel memory explosion
Occupancy Network Implicit Function MISE Decision Boundary End-to-End Reconstruction
IM-Net: Learning Implicit Fields for Generative Shape Modeling
1812.02822 (CVPR 2019)
• Binary classifier on (point + latent) for occupancy; demonstrated high-quality generative modeling of 3D shapes
Implicit Field Binary Classifier Shape Generation Isosurface Extraction Single-View Reconstruction
Scene Representation Networks (SRN)
1906.01618 (CVPR 2019)
• Combined neural rendering with continuous scene representations; global latents for objects and full scenes
Ray-Marching Neural Scene Novel View Synthesis Few-Shot Reconstruction
2020

Regular & Multiscale Local Grids

Researchers realized global latents lack detail. The solution: distribute multiple latents spatially on grids. Moving latents onto regular/multiscale grids solved the capacity problem but introduced memory and rigidity issues.

Convolutional Occupancy Networks (ConvONet)
2003.04618 (ECCV 2020)
• Replaced global latent with regular 3D feature grid; 3D CNN encoder + trilinear interpolation for occupancy decoding
Translation Equivariance Scene Reconstruction U-Net Decoder Structured Reasoning
Local Deep Implicit Functions (LDIF)
1912.06126 (CVPR 2020 Oral)
• Decomposed space into structured local implicit functions, each with own latent; better detail with fewer parameters
Structured Decomposition Gaussian Mixture Depth Completion Shape Auto-Encoding Generalization
IF-Net: Implicit Feature Networks for 3D Shape Reconstruction
2003.01456 (CVPR 2020)
• Multiscale feature grids at multiple resolutions concatenated; captured global structure and fine details simultaneously
Multi-Scale Feature Feature-Space Query Detail Preservation Articulated Reconstruction
Deep Local Shapes (DeepLS)
2003.10983 (ECCV 2020)
• Independent latent codes in a grid encoding local shape neighborhoods; high compression and local completion
Grid Latent Codes Local Shape Completion Compression
2021

Early Hierarchical & Octree Ideas

Hierarchy and sparsity emerged as key themes. Octree and hash-based approaches combined classical multigrid ideas with modern neural implicits, enabling adaptive resolution and dramatic training speedups.

SIREN: Implicit Neural Representations with Periodic Activations
2006.09661 (NeurIPS 2019)
• Periodic activation functions (sine) in MLPs represent complex signals more efficiently; accurate gradient representation
Periodic Activation Sine Activation Gradient Representation
MetaSDF: Meta-Learning Signed Distance Functions
2006.09662 (NeurIPS 2020)
• Meta-learning for implicit representations; learns initialization that quickly adapts to new shapes via gradient descent
Meta-Learning MAML Zero-Level Set Fast Inference Shape Prior
Multiresolution Deep Implicit Functions (MDIF)
2109.05591 (ICCV 2021)
• True hierarchy of latent grids with progressive decoding; latent grid dropout for shape completion
Hierarchical Latent Grids Residual Decoding Latent Dropout Progressive Decoding
Neural Geometric Level of Detail (NGLOD)
2101.10994 (SIGGRAPH 2021)
• Combined octree hierarchical sparse structure with neural fields; adaptive resolution via octree-based latent storage
Sparse Octree Level-of-Detail Real-Time Ray Tracing Neural SDF Feature Volume
Instant NGP: Neural Graphics Primitives with Multiresolution Hash Encoding
2201.05989 (SIGGRAPH 2022)
• Multiresolution hash encoding with trainable hash table; dramatically accelerated neural field training and representation
Hash Encoding Spatial Hash GPU Optimization Parametric Encoding
2022

Irregular Latent Grids

Explicit but adaptive positions + transformer encoding became the dominant approach for detail and efficiency. Farthest Point Sampling placed latents at surface-relevant locations, making representations sparse and transformer-compatible.

3DILG: Irregular Latent Grids for 3D Generative Modeling
2205.13914 (NeurIPS 2022)
• Pioneered irregular sparse latent grids: FPS places M latents at adaptive positions; kernel regression interpolates
Irregular Latent Grid Auto-Regressive Transformer Vector Quantization Farthest Point Sampling Sparse Adaptive Representation
Point-Voxel Diffusion (PVD)
2104.03670 (ICCV 2021)
• Pioneered diffusion models for 3D point clouds; point-voxel CNN backbone for unconditional and conditional generation
Denoising Diffusion Point-Voxel CNN Shape Completion Probabilistic Generation
LION: Latent Point Diffusion Models for 3D Shape Generation
2210.06978 (NeurIPS 2022)
• VQ-VAE + diffusion in compressed latent space; higher quality and diversity than direct 3D space generation
Latent Diffusion Hierarchical VAE Point Cloud Diffusion Denoising Diffusion Surface Reconstruction
2023

Position-Free & Feature-Plane Latents

The field shifted from "where the latent lives" to "what abstract feature the latent encodes." Position-free sets and axis-aligned feature planes emerged as powerful representations, enabling transformer-native diffusion on structured 2D feature maps.

3DShape2VecSet: A Representation for Neural Fields and Generative Diffusion
2301.11445 (TOG / SIGGRAPH 2023)
• Removed explicit 3D positions entirely; shape as fixed set of vectors with learned cross-attention for query interpolation
VecSet Cross-Attention Generative Diffusion KL Regularization Shape Auto-Encoding
3D Neural Field Generation Using Triplane Diffusion
2211.16677 (CVPR 2023)
• Popularized triplane representations (three orthogonal feature planes) for efficient 3D neural field generation with latent diffusion
Triplane Neural Field Diffusion 2D Diffusion Backbone Explicit Density Regularization
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
2304.09787 (CVPR 2023)
• Extended LDM paradigm to 3D scene generation; scene autoencoder to latent space to hierarchical diffusion
Scene Autoencoder Hierarchical Diffusion Voxel Grid
2023–2025

Discrete & Tokenized Representations

Discretization fundamentally changed latent code nature. VQ-VAE compression and autoregressive tokenization enabled transformer-based generation on discrete codebooks, bridging 3D shape understanding with language model architectures.

SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
2212.04493 (CVPR 2023)
• Multimodal generation via VQ-VAE + latent diffusion; supports partial shapes, images, text with adjustable conditioning weights
SDF Latent Diffusion Multi-Modal Conditioning VQ-VAE Shape Texturing
Shap-E: Generating Conditional 3D Implicit Functions
2305.02463 (arXiv / OpenAI 2023)
• Directly generates implicit function parameters (SDF/NeRF) via conditional diffusion; textured meshes and neural radiance fields
Implicit Neural Representation Conditional Diffusion NeRF Two-Stage Training Encoder-Decoder
Diffusion-3D: Large-Vocabulary 3D Diffusion Model with Transformer
2309.07920 (ICCV 2023)
• Large-vocabulary 3D diffusion via discrete tokenization; demonstrates transformer scalability for 3D generation
Large Vocabulary Discrete Tokens Transformer
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
2504.09975 (2025)
• Serializes octree hierarchies into multiscale autoregressive model; SOTA on unconditional and conditional 3D generation
Octree Autoregressive Multiscale Tokenization RoPE3D VQ-VAE Parallel Token Generation
2023–2024

Explicit Primitives: Gaussians

Gaussian primitives emerged as an alternative to implicit neural fields. Explicit point-based representations with learnable covariance enabled real-time rendering and faster optimization, bridging classical graphics with neural generation.

DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
2309.16653 (ICCV 2024)
• Combines Gaussian splatting with diffusion priors via SDS for efficient text-to-3D; mesh extraction for downstream use
Gaussian Splatting SDS UV Texture Refinement Mesh Extraction
GaussianDreamer: Fast Text-to-3D by Bridging 2D and 3D Diffusion
2310.08529 (arXiv 2023)
• Hybrid 2D+3D diffusion; point cloud diffusion prior initializes Gaussians, then SDS refines
Gaussian Splatting 2D-3D Bridging SDS Noise Point Growth Real-Time Rendering
LGM: Large Multi-View Gaussian Model for 3D Generation
2402.05054 (CVPR 2024)
• Gaussian-based 3D generation from single image; multi-view images to 3D Gaussians for faster rendering
Multi-View Gaussian Asymmetric UNet Gaussian Splatting Mesh Extraction Feed-Forward 3D
GSN: Generalisable Segmentation in Neural Radiance Field
2402.04632 (arXiv 2024)
• Neural fields based on Gaussian distributions; models radiance as Gaussians for flexible hybrid implicit-explicit representations
Generalizable NeRF Semantic Feature Distillation DINO Features Neural Feature Field Cross-Scene Generalization
2024–2026

Adaptive Hierarchical Structures

The frontier returned to hierarchical representations with learned adaptivity. Geometry-aware local grids and unsupervised hierarchical transformers enable adaptive resolution that follows surface complexity, closing the loop from global codes back to structured hierarchies.

GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation
2410.10037 (ICLR 2025)
• Geometry-aware local adaptive grids exploiting surface sparsity and local geometric properties; cascaded generation for detail
Octree Forest Geometry-Adaptive Grid Cascaded Diffusion SDF Local Adaptive Encoding
HiT: Hierarchical Transformers for Unsupervised 3D Shape Abstraction
2510.27088 (3DV 2026)
• Unsupervised hierarchical abstractions using compressed codebooks across tree levels
Hierarchical Transformer Unsupervised Abstraction Convex Primitive Variable-Branch Tree Cross-Attention Codebook