PlenOctrees for Realtime Rendering of Neural Radiance Fields
Abstract
We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octreebased 3D representation which supports viewdependent effects. Our method can render 800800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform freeviewpoint rendering of scenes with arbitrary geometry and viewdependent effects. Realtime performance is achieved by pretabulating the NeRF into a PlenOctree. In order to preserve viewdependent effects such as specularities, we factorize the appearance via closedform spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our realtime neural rendering approach may potentially enable new applications such as 6DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to inbrowser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu.net/plenoctrees.
1 Introduction
Despite the progress of realtime graphics, interactive 3D content with truly photorealistic scenes and objects are still time consuming and costly to produce due to the necessity of optimized 3D assets and dedicated shaders. Instead, many graphics applications opt for imagebased solutions. Ecommerce websites often use a fixed set of views to showcase their products; VR experiences often rely on 360 video recordings to avoid the costly production of real 3D scenes, and mapping services such as Google Street View stitch images into panoramic views limited to 3DOF.
Recent advances in neural rendering, such as neural volumes [23] and neural radiance fields (NeRFs) [28], open a promising new avenue to model arbitrary objects and scenes in 3D from a set of calibrated images. NeRFs in particular can faithfully render detailed scenes and appearances with nonLambertian effects from any view, while simultaneously offering a high degree of compression in terms of storage. Partly due to these exciting properties, of late, there has been an explosion of research based on NeRF.
Nevertheless, for practical applications, runtime performance remains a critical limitation of NeRFs: due to the extreme sampling requirements and costly neural network queries, rendering a NeRF is agonizingly slow. For illustration, it takes roughly 30 seconds to render an 800x800 image from a NeRF using a high performance GPU, making it impractical for realtime interactive applications.
In this work, we propose a method for rendering a NeRF in real time, achieved by distilling the NeRF into a hierarchical 3D volumetric representation. Our approach preserves NeRF’s ability to synthesize arbitrarily complex geometry and viewdependent effects from any viewpoint and requires no additional supervision. In fact, our method achieves and in many cases surpasses the quality of the original NeRF formulation, while providing significant acceleration. Our model allows us to render an 800x800 image at 167.68 FPS on a NVIDIA V100 GPU and does not rely on a deep neural network during test time. Moreover, our representation is amenable to modern web technologies, allowing interactive rendering in a browser on consumer laptops.
Naive NeRF rendering is slow because it requires dense sampling of the scene, where every sample requires inference through a deep MLP. Because these queries depend on the viewing direction as well as the spatial position, one cannot naively cache these color values for all viewing directions.
We overcome these challenges and enable realtime rendering by presampling the NeRF into a tabulated viewdependent volume which we refer to as a PlenOctree, named after the plenoptic functions of Adelsen and Bergen [1]. Specifically, we use a sparse voxelbased octree where every leaf of the tree stores the appearance and density values required to model the radiance at a point in the volume. In order to account for nonLambertian materials that exhibit viewdependent effects, we propose to represent the RGB values at a location with spherical harmonics (SH), a standard basis for functions defined on the surface of the sphere. The spherical harmonics can be evaluated at arbitrary query viewing directions to recover the view dependent color.
Although one could convert an existing NeRF into such a representations via projection onto the SH basis functions, we show that we can in fact modify a NeRF network to predict appearances explicitly in terms of spherical harmonics. Specifically, we train a network that produces coefficients for the SH functions instead of raw RGB values, so that the predicted values can later be directly stored within the leaves of the PlenOctree. We also introduce a sparsity prior during NeRF training to improve the memory efficiency of our octrees, consequently allowing us to render higher quality images. Furthermore, once the structure is created, the values stored in PlenOctree can be optimized because the rendering procedure remains differentiable. This enables the PlenOctree to obtain similar or better image quality compared to NeRF. Our pipeline is illustrated in Fig. 2.
Additionally, we demonstrate how our proposed pipeline can be used to accelerate NeRF model training, making our solution more practical to train than the original NeRF approach. Specifically, we can stop training the NeRF model early to convert it into a PlenOctree, which can then be trained significantly faster as it no longer involves any neural networks.
Our experiments demonstrate that our approach can accelerate NeRFbased rendering by 5 orders of magnitude without loss in image quality. We compare our approach on standard benchmarks with scenes and objects captured from views, and demonstrate stateoftheart level performance for image quality and rendering speed.
Our interactive viewer can enable operations such as object insertion, visualizing radiance distributions, decomposing the SH components, and slicing the scene. We hope that these realtime operations can be useful to the community for visualizing and debugging NeRFbased representations.
To summarize, we make the following contributions:

The first method that achieves realtime rendering of NeRFs with similar or improved quality.

NeRFSH: a modified NeRF that is trained to output appearance in terms of spherical basis functions.

PlenOctree, a data structure derived from NeRFs which enables highly efficient viewdependent rendering of complex scenes.

Accelerated NeRF training method using an early training termination, followed by a direct finetuning process on PlenOctree values.
2 Related Work
Novel View Synthesis. The task of synthesizing novel views of a scene given a set of photographs is a wellstudied problem with various approaches. All methods predict an underlying geometric or imagebased 3D representation that allows rendering from novel viewpoints. Mesh based methods represent the scene with surfaces, and have been used to model Lambertian (diffuse) [53] and nonLambertian scenes [57, 5, 3].
Mesh based representations are compact and easy to render; however, optimizing a mesh to fit a complex scene of arbitrary topology is challenging. Imagebased rendering methods [18, 40, 57], on the other hand, enable easy capture as well as photorealistic and fast rendering, however are often bounded in the viewing angle and do not allow easy editing of the underlying scene.
Volume rendering is a classical technique with a long history of research in the graphics community [7]. Volumebased representations such as voxel grids [39, 17, 23, 13, 52, 41] and multiplane images (MPIs) [46, 33, 61, 45, 27] are a popular alternative to mesh representations due to their topologyfree nature: gradientbased optimization is therefore straightforward, while rendering can still be realtime. However, such naive volumetric representations are often memory bound, limiting the maximum resolution that can be captured. Volumetric octrees are a popular approach for reducing memory and compute in such cases. We refer the reader to this survey [16] for a historical perspective on octree volume rendering. Octrees have been used in recent work to decrease the memory requirements during training for other 3D tasks [36, 11, 49, 54]. Concurrent with this work, Nex [56] extends MPIs to encode spherical basis functions that enable viewdependent rendering effects in realtime. However, unlike our representation, their approach is limited in the viewing direction due to their use of MPIs. Also concurrently, Lombardi et al. [24] propose to model data using geometric primitives, which allows for fast rendering while conserving space; however, they require a coarse mesh to initialize the primitives.
CoordinateBased Neural Networks. Recently, coordinatebased neural networks have emerged as a popular alternative to explicit volumetric representations, as they are not limited to a fixed voxel representation. These methods train a multilayer perceptron (MLP) whose input is a coordinate and output is some property of space corresponding to that location. These networks have been used to predict occupancy [26, 4, 32, 37, 29, 19], signed distance fields [30, 10, 58, 59], and radiance [28]. Coordinatebased neural networks have been used for view synthesis in Scene Representation Networks [42], NeRFs [28], and many NeRF extensions [25, 31, 38, 44]. These networks represent a continuous function that can be sampled at arbitrarily fine resolutions without increasing the memory footprint. Unfortunately, this compactness is achieved at the expense of computational efficiency as each sample must be processed by a neural network. As a result, these representations are often slow and impractical for realtime rendering.
NeRF Accelerations. While NeRFs are able to produce high quality results, their computationally expensive rendering leads to slow training and inference. One way to speed up the process of fitting a NeRF to a new scene is to incorporate priors learned from a dataset of similar scenes. This can be accomplished by conditioning on predicted images features [50, 60, 55] or metalearning [48]. To improve inference speed, Neural Sparse Voxel Fields (NSVF) [22] learns a sparse voxel grid of features that are input into a NeRF like model. The sparse voxel grid allows the renderer to skip over empty regions when tracing a ray which improves the render time 10x. Decomposed Radiance Fields [35] spatially decomposes a scene into multiple smaller networks. This method focuses on forward facing scenes. AutoInt [21] modifies the architecture of the NeRF so that inference requires fewer samples but produces lower quality results. None of these approaches achieve realtime. The concurrent work DoNeRF adds a depth classifier to NeRF in order to drastically improve the efficiency of sampling, but requires groundtruth depth for training. Although not based on NeRF, recently Takikawa \etal [47] propose a method to accelerate neural SDF rendering with an octree. Note that this work does not model appearance properties. In contrast, we employ a volumetric representation that can capture photorealistic viewdependent appearances while achieving even higher framerates.
3 Preliminaries
3.1 Neural Radiance Fields
Neural radiance fields (NeRF) [28] are 3D representations that can be rendered from arbitrary novel viewpoints while capturing continuous geometry and viewdependent appearance. The radiance field is encoded into the weights of a multilayer perceptron (MLP) that can be queried at a position from a viewing direction to recover the corresponding density and color . A pixel’s predicted color is computed by casting a ray, , into the volume and accumulating the color based on density along the ray. NeRF estimates the accumulated color by taking point samples along the ray to perform volume rendering:
(1)  
(2) 
Where are the distances between point samples. To train the NeRF network, the predicted colors for a batch of rays corresponding to pixels in the training images are optimized using Adam [14] to match the target pixel colors:
(3) 
To better represent high frequency details in the scene, positional encoding is applied to the inputs, and two stages of sampling are performed. We refer the interested reader to the NeRF paper [28] paper for details.
Limitations.
One notable consequence of this architecture is that each sample along the ray must be fed to the MLP to obtain the corresponding and . A total of 192 samples were taken for each ray in the examples presented in NeRF. This is inefficient as most samples are sampling free space which do not contribute to the integrated color. To render a single target image at resolution, the network must be run on over 100 million inputs. Therefore it takes about 30 seconds to render a single frame using a NVIDIA V100 GPU, making it impractical for realtime applications. Our use of a sparse voxel octree avoids excess compute in regions without content. Additionally we precompute the values for each voxel so that network queries are not performed during inference.
4 Method
We propose a pipeline that enables realtime rendering of NeRFs. Given a trained NeRF, we can convert it into a PlenOctree, an efficient data structure that is able to represent nonLambertian effects in a scene. Specifically, it is an octree which stores spherical harmonics (SH) coefficients at the leaves, encoding viewdependent radiance.
To make the conversion to PlenOctree more straightforward, we also propose NeRFSH, a variant of the NeRF network which directly outputs the SH coefficients, thus eliminating the need for a viewdirection input to the network. With this change, the conversion can then be performed by evaluating on a uniform grid followed by thresholding. We finetune the octree on the training images to further improve image quality, Please see Fig. 2 for a graphical illustration of our pipeline.
The conversion process leverages the continuous nature of NeRF to dynamically obtain the spatial structure of the octree. We show that even with a partially trained NeRF, our PlenOctree is capable of producing results competitive with the fully trained NeRF.
4.1 NeRFSH: NeRF with Spherical Harmonics
SHs have been a popular lowdimensional representation for spherical functions and have been used to model Lambertian surfaces [34, 2] or even glossy surfaces [43]. Here we explore its use in a volumetric context. Specifically, we adapt the NeRF network to output spherical harmonics coefficients , rather than RGB values.
(4) 
Each is a set of 3 coefficients corresponding to the RGB components. In this setup, the viewdependent color at a point may be determined by querying the SH functions at the desired viewing angle :
(5) 
Where is the sigmoid function for normalizing the colors. In other words, we factorize the viewdependent appearance with the SH basis, eliminating the viewdirection input to the network and removing the need to sample view directions at conversion time. Please see the appendix for more technical discussion of SHs. With a single evaluation of the network, we can now efficiently query colors from arbitrary viewing angles at inference time. In Fig. 7, it can be seen that NeRFSH training speed is similar to, but slightly faster than, NeRF (by about 10%).
Note that we can also project a trained NeRF to SHs directly at each point by sampling NeRF at random directions and multiplying by the SH component values to form Monte Carlo estimates of the inner products. However, this sampling process takes several hours to achieve reasonable quality and imposes a quality loss of about 2 dB.^{1}^{1}1With 10000 viewdirection samples per point, taking about 2 hours, the PSNR is 29.21 vs. 31.02 for our main method prior to optimization. Nevertheless, this alternative approach offers a pathway to convert existing NeRFs into PlenOctrees.
Other than SHs, we also experiment with Spherical Gaussians (SG) [8], a learnable spherical basis which have been used to represent allfrequency lighting [51, 43, 20]. We find that SHs perform better in our use case and provide an ablation in the appendix.
Sparsity prior. Without any regularization, the model is free to generate arbitrary geometry in unobserved regions. While this does not directly worsen image quality, it would adversely impact our conversion process as the extra geometry occupies significant voxel space.
To solve this problem, we introduce an additional sparsity prior during NeRF training. Intuitively, this prior encourages NeRF to choose empty space when both space and solid colors are possible solutions. Formally,
(6) 
Here, are the evaluated density values at uniformly random points within the bounding box, and is a hyperparameter. The final training loss is then , where is a hyperparameter. Fig. 3 illustrates the effect of the prior.
Synthetic NeRF Dataset best secondbest  

PSNR  SSIM  LPIPS  FPS  
NeRF (original)  31.01  0.947  0.081  0.023 
NeRF  31.69  0.953  0.068  0.045 
SRN  22.26  0.846  0.170  0.909 
Neural Volumes  26.05  0.893  0.160  3.330 
NSVF  31.75  0.953  0.047  0.815 
AutoInt (8 sections)  25.55  0.911  0.170  0.380 
NeRFSH  31.57  0.952  0.063  0.051 
PlenOctree from NeRFSH  31.02  0.951  0.066  167.68 
PlenOctree after finetuning  31.71  0.958  0.053  167.68 
Tanks and Temples Dataset best secondbest  

PSNR  SSIM  LPIPS  FPS  
NeRF (original)  25.78  0.864  0.198  0.007 
NeRF  27.94  0.904  0.168  0.013 
SRN  24.10  0.847  0.251  0.250 
Neural Volumes  23.70  0.834  0.260  1.000 
NSVF  28.40  0.900  0.153  0.163 
NeRFSH  27.82  0.902  0.167  0.015 
PlenOctree from NeRFSH  27.34  0.897  0.170  42.22 
PlenOctree after finetuning  27.99  0.917  0.131  42.22 
4.2 PlenOctree: Octreebased Radiance Fields
Once we have trained a NeRFSH model, we can convert it into a sparse octree representation for real time rendering. A PlenOctree stores density and SH coefficients modelling viewdependent appearance at each leaf. We describe the conversion and rendering processes below.
Rendering. To render the PlenOctree, for each ray, we first determine rayvoxel intersections in the octree structure. This produces a sequence of distances between voxel boundaries , each of which has constant density and color. NeRF’s volume rendering model (1) is then applied to assign a color to the ray. Note that compared to the uniform sampling employed Neural Volumes [23], this approach is able to skip large voxels in one step while also not missing small voxels.
At testtime, we further accelerate this rendering process by applying earlystopping when the ray has accumulated transmittance less than .
Conversion from NeRFSH. The conversion process can be divided into three steps. At a high level, we evaluate the network on a grid, retaining only density values, then filter the voxels via thresholding. Finally we sample random points within each remaining voxel and average them to obtain SH coefficients to store in the octree leaves. More details are given below:
Evaluation. We first evaluate the NeRFSH model to obtain values on a uniformly spaced 3D grid. The grid is automatically scaled to tightly fit the scene content.^{2}^{2}2 By preevaluating on a larger grid and finding the bounding box of all points with .
Filtering. Next, we filter this grid to obtain a sparse set of voxels centered at the grid points sufficient for representing the scene. Specifically, we render alpha maps for all the training views using this voxel grid, keeping track of the maximum ray weight at each voxel. We then eliminate the voxels whose weights are lower than a threshold . The octree is constructed to contain the remaining voxels as leaves at the deepest level while being empty elsewhere. Compared to naively thresholding by at each point, this method can eliminates nonvisible voxels.
Sampling. Finally, we sample a set of random points in each remaining voxel and set the associated leaf of the octree to the mean of these values to reduce aliasing. Each leaf now contains the density and a vector of spherical harmonics coefficients for each of the RGB color channels.
This full extraction process takes about 15 minutes.^{3}^{3}3 Note that sampling points instead of allows for extraction in about minutes, with minimal loss in quality.
4.3 PlenOctree Optimization
Since this volume rendering process is fully differentiable with respect to the tree values, we can directly finetune the resulting octree on the original training images using the NeRF loss (3) with SGD in order to improve the image quality. Note that the tree structure is fixed to that obtained from NeRF in this process. PlenOctree optimization operates at about million rays per second, compared to about for NeRF training, allowing us to optimize for many epochs in a relatively short time. The analytic derivatives for this process are implemented in custom CUDA kernels. We defer technical details to the appendix.
Model Description  GB  PSNR  FPS  

Ours1.9G  Complete Model as in Table 1  1.93  31.71  168 
Ours1.4G  Higher Threshold  1.36  31.64  215 
Ours0.4G  w/o Auto Bbox Scaling  0.44  30.70  329 
Ours0.3G  Grid Size 256  0.30  29.60  410 
The fast octree optimization indirectly allows us to accelerate NeRF training, as seen in Fig. 7, since we can elect to stop the NeRFSH training at an earlier time for constructing the PlenOctree, with only a slight degradation in quality.
5 Results
5.1 Experimental Setup
Datasets. For our experiments, we use the NeRFsynthetic [28] dataset and a subset of the Tanks and Temples dataset [15]. The NeRFsynthetic dataset consists of 8 scenes where each scene has a central object that is imaged with 100 inward facing cameras distributed randomly on the upper hemisphere. The images are with provided ground truth camera poses. The Tanks and Temples subset is from NSVF [22] and contains 5 scenes of real objects captured by an inward facing camera that circles the scene. We use foreground masks provided by the NSVF authors. Each scene contains between 152384 images of size .
Baselines. The principal baseline for our experiments is NeRF [28]; we report results for both the original NeRF implementation, denoted NeRF (original) as well as a reimplementation in Jax [6], denoted simply NeRF, which our NeRFSH code is based off of. Unless otherwise stated, all NeRF results and timings are from the latter implementation. We compare also to two recent papers introducing NeRF accelerations, neural sparse voxel fields (NSVF) [22] and AutoInt [21], as well as two older methods, scene representation networks (SRN) [42] and Neural Volumes [23].
5.2 Quality Evaluation
We evaluate our approach against prior works on the synthetic and real datasets mentioned above. The results are in Tables 1 and Table 2 respectively. Note that none of the baselines achieve realtime performance; nevertheless, our quality results are competitive in all cases and better in terms of some metrics.
In Figures 4 and 6, we show qualitative examples that demonstrate that our PlenOctree conversion does not perceptually worsen the rendered images compared to NeRF; rather, we observe that the PlenOctree optimization process enhances fine details such as text. Additionally, we note that our modifications of NeRF to predict spherical function coefficients (NeRFSH) does not significantly change the performance.
For the SH, we set (16 components) and (25 components) on the synthetic and Tanks & Temples datasets respectively. We use grid size in either case. Please refer to the appendix for training details. The inference time performance is measured on a Tesla V100 for all methods. Across both datasets we find that PlenOctrees perform inference over 3000 times faster than NeRF and at least 30 times faster than all other compared methods. PlenOctree performs either best, or second best for all image quality metrics.
5.3 Speed Tradeoff Analysis
5.4 Indirect Acceleration of NeRF Training
Since we can efficiently finetune the octree on the original training data, as briefly discussed in §4.3, we can choose to stop the NeRFSH training at an earlier time before converting it to a PlenOctree. Indeed, we have found that the image quality improvements gained during finetuning can often be greater than continuing to train the NeRFSH an equivalent amount of time. Therefore it can be more time efficient to stop the NeRFSH training before it has converged and transition to PlenOctree conversion and finetuning.
In Figure 7 we compare NeRF and NeRFSH models trained for 2 million iterations each to a sequence of PlenOctree models extracted from NeRFSH checkpoints. We find that given a time constraint, it is almost always preferable to stop the NeRF training and transition to PlenOctree optimization.
5.5 Realtime and Inbrowser Applications
Interactive demos. Within our desktop viewer, we are able to perform a variety of realtime scene operations on the PlenOctree representation. For example, it is possible to insert meshes while maintaining proper occlusion, slice the PlenOctree to visualize a crosssection, or render the depth map to verify the geometry. Other features include probing the radiance distribution at any point in space, and inspecting subsets of SH components. These examples are demonstrated in Figure 9. The ability to perform these actions in realtime is beneficial both for interactive entertainment and debugging NeRFrelated applications.
Web renderer. We have implemented a webbased renderer enabling interactive viewing of converted PlenOctrees in the browser. This is achieved by rewriting our CUDAbased PlenOctree renderer as a WebGLcompatible fragment shader. We apply compressions to make serving the octrees more manageable. Please see the appendix for more information.
6 Discussion
We have introduced a new data representation for NeRFs using PlenOctrees, which enables realtime rendering capabilities for arbitrary objects and scenes. Not only can we accelerate the rendering performance of the original NeRF method by more than 3000 times, but we can produce images that are either equal or better quality than NeRF thanks to our hierarchical data structure. As training time poses another hurdle for adopting NeRFs in practice (taking 12 days to fully converge), we also showed that our PlenOctrees can accelerate effective training time for our NeRFSH. Finally, we have implemented an inbrowser viewer based on WebGL to demonstrate realtime and 6DOF rendering capabilities of NeRFs on consumer laptops. In the future, our approach may enable virtual online stores in VR, where any products with arbitrary complexity and materials can be visualized in realtime while enabling 6DOF viewing.
Limitations and Future Work.
While we achieve stateoftheart rendering performance and frame rates, the octree representation is much larger than the compact representation of the original NeRF model and has a larger memory footprint. The average uncompressed octree size for the full model is 1.93 GB on the synthetic dataset and 3.53 GB on the Tanks and Temples dataset. For online delivery, we use lowerresolution compressed models which are about 30120 MB; please see the appendix for details. Although already possible in some form (Fig. 8), applying our method to unbounded and forwardfacing scenes optimally requires further work as the data distribution is different for unbounded scenes. The forwardfacing scenes inherently do not support 6DOF viewing, and we suggest MPIs may be more appropriate in this case [56].
In the future, we plan to explore extensions of our method to enable realtime 6DOF immersive viewing of largescale scenes, as well as of dynamic scenes. We believe that realtime rendering of NeRFs has the potential to become a new standard for nextgeneration AR/VR technologies, as photorealistic 3D content can be digitized as easily as recording 2D videos.
References
 [1] (1991) The plenoptic function and the elements of early vision. Vol. 2, Vision and Modeling Group, Media Laboratory, Massachusetts Institute of …. Cited by: §1.
 [2] (2003) Lambertian reflectance and linear subspaces. IEEE transactions on pattern analysis and machine intelligence 25 (2), pp. 218–233. Cited by: §4.1.
 [3] (2001) Unstructured lumigraph rendering. In Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pp. 425–432. Cited by: §2.
 [4] (2019) Learning implicit fields for generative shape modeling. In IEEE Computer Vision and Pattern Recognition (CVPR), pp. . Cited by: §2.
 [5] (1996) Modeling and rendering architecture from photographs: a hybrid geometryand imagebased approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pp. 11–20. Cited by: §2.
 [6] JaxNeRF: an efficient JAX implementation of NeRF External Links: Link Cited by: §B.4, §5.1.
 [7] (1988) Volume rendering. ACM Siggraph Computer Graphics 22 (4), pp. 65–74. Cited by: §2.
 [8] (1953) Dispersion on a sphere. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 217 (1130), pp. 295–305. Cited by: §A.2, §B.1, §4.1.
 [9] Zlib External Links: Link Cited by: item 2.
 [10] (2020) Implicit geometric regularization for learning shapes. ICML. Cited by: §2.
 [11] (2017) Hierarchical surface prediction for 3d object reconstruction. In 2017 International Conference on 3D Vision (3DV), pp. 412–420. Cited by: §2.
 [12] (1982) Color image quantization for frame buffer display. ACM SIGGRAPH Proceedings. Cited by: item 1.
 [13] (2017) Learning a multiview stereo machine. arXiv preprint arXiv:1708.05375. Cited by: §2.
 [14] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §B.4, §3.1.
 [15] (2017) Tanks and temples: benchmarking largescale scene reconstruction. ACM Transactions on Graphics (ToG) 36 (4), pp. 1–13. Cited by: §5.1.
 [16] (2006) A survey of octree volume rendering methods. GI, the Gesellschaft für Informatik, pp. 87. Cited by: §2.
 [17] (2000) A theory of shape by space carving. International journal of computer vision 38 (3), pp. 199–218. Cited by: §2.
 [18] (1996) Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pp. 31–42. Cited by: §2.
 [19] (2020) Monocular realtime volumetric performance capture. In European Conference on Computer Vision, pp. 49–67. Cited by: §2.
 [20] (2020) Inverse rendering for complex indoor scenes: shape, spatiallyvarying lighting and svbrdf from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2475–2484. Cited by: §B.1, §4.1.
 [21] (2020) AutoInt: automatic integration for fast neural volume rendering. arXiv preprint arXiv:2012.01714. Cited by: §2, §5.1.
 [22] (2020) Neural sparse voxel fields. NeurIPS. Cited by: §A.1, §2, §5.1, §5.1.
 [23] (201907) Neural volumes: learning dynamic renderable volumes from images. ACM Trans. Graph. 38 (4), pp. 65:1–65:14. Cited by: §A.1, §1, §2, §4.2, §5.1.
 [24] (2021) Mixture of volumetric primitives for efficient neural rendering. Note: preprint External Links: 2103.01954 Cited by: §2.
 [25] (2021) NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In CVPR, Cited by: §2.
 [26] (2019) Occupancy networks: learning 3d reconstruction in function space. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 [27] (2019) Local light field fusion: practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG) 38 (4), pp. 1–14. Cited by: §2.
 [28] (2020) NeRF: representing scenes as neural radiance fields for view synthesis. ECCV. Cited by: §1, Figure 2, §2, §3.1, §3.1, §5.1, §5.1.
 [29] (2020) Differentiable volumetric rendering: learning implicit 3d representations without 3d supervision. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 [30] (201906) DeepSDF: learning continuous signed distance functions for shape representation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.
 [31] (2020) Deformable neural radiance fields. arXiv preprint arXiv:2011.12948. Cited by: §2.
 [32] (2020) Convolutional occupancy networks. In European Conference on Computer Vision (ECCV), Cited by: §2.
 [33] (2017) Soft 3d reconstruction for view synthesis. ACM Transactions on Graphics (TOG) 36 (6), pp. 1–11. Cited by: §2.
 [34] (2001) On the relationship between radiance and irradiance: determining the illumination from images of a convex lambertian object. JOSA A 18 (10), pp. 2448–2459. Cited by: §4.1.
 [35] (2020) DeRF: decomposed radiance fields. arXiv preprint arXiv:2011.12490. Cited by: §2.
 [36] (2017) OctNet: learning deep 3d representations at high resolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
 [37] (201910) PIFu: pixelaligned implicit function for highresolution clothed human digitization. In The IEEE International Conference on Computer Vision (ICCV), Cited by: §2.
 [38] (2020) Graf: generative radiance fields for 3daware image synthesis. arXiv preprint arXiv:2007.02442. Cited by: §2.
 [39] (1999) Photorealistic scene reconstruction by voxel coloring. International Journal of Computer Vision 35 (2), pp. 151–173. Cited by: §2.
 [40] (1998) Layered depth images. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques, pp. 231–242. Cited by: §2.
 [41] (2019) Deepvoxels: learning persistent 3d feature embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2446. Cited by: §2.
 [42] (2019) Scene representation networks: continuous 3dstructureaware neural scene representations. arXiv preprint arXiv:1906.01618. Cited by: §A.1, §2, §5.1.
 [43] (2002) Precomputed radiance transfer for realtime rendering in dynamic, lowfrequency lighting environments. In Proceedings of the 29th annual conference on Computer graphics and interactive techniques, pp. 527–536. Cited by: §B.1, §4.1, §4.1.
 [44] (2020) NeRV: neural reflectance and visibility fields for relighting and view synthesis. arXiv preprint arXiv:2012.03927. Cited by: §2.
 [45] (2019) Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 175–184. Cited by: §2.
 [46] (1998) Stereo matching with transparency and matting. In Sixth International Conference on Computer Vision (IEEE Cat. No. 98CH36271), pp. 517–524. Cited by: §2.
 [47] (2021) Neural geometric level of detail: realtime rendering with implicit 3D shapes. arXiv preprint arXiv:2101.10994. Cited by: §2.
 [48] (2021) Learned initializations for optimizing coordinatebased neural representations. In CVPR, Cited by: §2.
 [49] (2017) Octree generating networks: efficient convolutional architectures for highresolution 3d outputs. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2088–2096. Cited by: §2.
 [50] (2020) GRF: learning a general radiance field for 3d scene representation and rendering. arXiv preprint arXiv:2010.04595. Cited by: §2.
 [51] (2006) Allfrequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation. ACM Transactions on graphics (TOG) 25 (3), pp. 967–976. Cited by: §B.1, §4.1.
 [52] (2017) Multiview supervision for singleview reconstruction via differentiable ray consistency. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2626–2634. Cited by: §2.
 [53] (2014) Let there be color! largescale texturing of 3d reconstructions. In ECCV, pp. 836–850. Cited by: §2.
 [54] (2017) Ocnn: octreebased convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics (TOG) 36 (4), pp. 1–11. Cited by: §2.
 [55] (2021) IBRNet: learning multiview imagebased rendering. arXiv preprint arXiv:2102.13090. Cited by: §2.
 [56] (2021) NeX: realtime view synthesis with neural basis expansion. External Links: 2103.05606 Cited by: §2, §6.
 [57] (2000) Surface light fields for 3d photography. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pp. 287–296. Cited by: §2, §2.
 [58] (2019) DISN: deep implicit surface network for highquality singleview 3d reconstruction. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d\textquotesingleAlchéBuc, E. Fox, and R. Garnett (Eds.), pp. 492–502. External Links: Link Cited by: §2.
 [59] (2020) Multiview neural surface reconstruction by disentangling geometry and appearance. Advances in Neural Information Processing Systems 33. Cited by: §2.
 [60] (2021) PixelNeRF: neural radiance fields from one or few images. In CVPR, Cited by: §2.
 [61] (2018) Stereo magnification: learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817. Cited by: §2.
Appendix
Appendix A Additional Results
a.1 Detailed comparisons
Here we provide further qualitative comparisons with baselines: SRN [42], Neural Volumes [23], NSVF [22] in Figure 10. We show more qualitative results of our method in Figure 11 and Figure 12. We also report a perscene breakdown of the quantitative metrics against all approaches in Table 5, 6, 7, 8.
a.2 Spherical Basis Function Ablation
We also provide ablation studies on the choice of spherical basis functions. We first ablate the effect on the number of spherical harmonics basis, then we explore the use of a learnable spherical basis functions. All experiments are conducted on NeRFsynthetic dataset and we report the average metric directly after training NeRF with spherical basis functions and after converting it to PlenOctrees with finetuning.
Number of SH basis functions
First, we ablate the number of basis functions used for spherical harmonics. Average metrics across the NeRFsynthetic dataset are reported both for the modified NeRF model and the corresponding PlenOctree. We found that switching between (SH16) and (SH25) makes very little difference in terms of metrics or visual quality.
Spherical Gaussians
Furthermore, we also experimented with spherical Gaussians (SGs) [8], which is another form of spherical basis functions similar to spherical harmonics, but with learnable Gaussian kernels. Please see §B.1 for a brief introduction of SHs and SGs. SG25 denotes our model using 25 SG components instead of SH, all with learnable lobe axis and bandwidth . However, while this model has marginally better PSNR, the advantage disappears following PlenOctree conversion and finetuning.
NeRFSH/SG  Converted PlenOctree  

Basis  PSNR  SSIM  LPIPS  PSNR  SSIM  LPIPS  GB  FPS 
SH9  31.44  0.951  0.065  31.45  0.956  0.056  1.00  262 
SH16  31.57  0.952  0.063  31.71  0.958  0.053  1.93  168 
SH25  31.56  0.951  0.063  31.69  0.958  0.052  2.68  128 
SG25  31.74  0.953  0.062  31.63  0.958  0.052  2.26  151 
Appendix B Technical Details
b.1 Spherical Basis Functions: SH and SG
In the main paper, we used the SH functions without defining their exact form. Here, we provide a brief technical discussion of both spherical harmonics (SH) and spherical Gaussians (SG) for completeness.
Spherical Harmonics.
The Spherical Harmonics (SH) form a complete basis of functions . For and , the SH function of degree and order is defined as:
(7) 
where are the associated Legendre polynomials. A real basis of SH can be defined in terms of its complex analogue by setting
(8) 
Any real spherical function may then be expressed in the SH basis:
(9) 
Spherical Gaussians.
Spherical Gaussians (SGs), also known as the von MisesFisher distribution [8], is another form of spherical basis functions that have been widely adopted to approximate spherical functions. Unlike SHs, SGs are a learnable basis. A normalized SG is defined as:
(10) 
Where is the lobe axis, and is the bandwidth (sharpness) of the Gaussian kernel. Due to the varying bandwidths supported by SGs, they are suitable for representing allfrequency signals such as lighting [51, 43, 20]. A spherical function represented using SGs is formulated as:
(11) 
Where is the RGB coefficients for each SG.
b.2 PlenOctree Compression
The uncompressed PlenOctree file would be unpleasantly timeconsuming for users to download for inbrowser rendering. Thus, to minimize the size of PlenOctrees for viewing in the browser, we use SH9 instead of SH16 or SH25 and apply a looser bounding box, which reduces the number of occupied voxels. On top of this, we compress the PlenOctrees directly in the following ways:

We quantize the SH coefficients in the tree using the popular mediancut algorithm [12]. More specifically, the values are kept as is; for each SH basis function, we quantize the RGB coefficients into colors. Afterwards, separately for each SH basis function, we store a codebook (as float16) along with pointers from each tree leaf to a position in the codebook (as int16).

We compress the entire tree, including pointers, using the standard DEFLATE algorithm from ZLIB [9].
This process reduces the file size by as much as  times. The tree is fully decompressed before it is displayed in the web renderer. We will also release this code.
b.3 Analytic Derivatives of PlenOctree Rendering
In this section, we derive the analytic derivatives of the NeRF piecewise constant volume rendering model for optimizing PlenOctrees directly. Throughout this section we will consider a fixed ray with a given origin and direction.
b.3.1 Definitions
For preciseness, we provide definitions of quantities used in NeRF volume rendering. The NeRF rendering model considers a ray divided into consecutive segments with endpoints , where and are the near and far bounds. The segments have constant densities where each . If we shine a light of intensity at , then at the camera position , the light intensity is given by
(12) 
Where are segment lengths as in 3 of the main paper. Note that is also known as the accumulated transmittance from to , and is the same as the definition in (1). It can be shown that this precisely models the absorption within each segment in the piecewiseconstant setting.
Let be the color associated with segments , and be the background light intensity; each is an RGB color. We are interested in the derivative of the rendered color with respect to and . Note (background) is typically considered a hyperparameter.
b.3.2 Derivation of the Derivatives
From the original NeRF rendering equation (1), we can express the rendered ray color as:
(13)  
(14) 
Where are segment weights, and .^{4}^{4}4 Note that the background color was omitted in equation (1) of the main paper for simplicity.
Color derivative.
Since the rendered color are a convex combination of the segment colors, it’s immediately clear that
(15) 
Handling spherical harmonics colors is straightforward by applying the chain rule, noting that the SH basis function values are constant across the ray.
Density derivative.
This is slightly more tricky. We can write the derivative wrt. ,
(16) 
Where the derivative of the intensity , is
(17)  
(18)  
(19) 
denotes an indicator function whose value is if or else. Basically, we can delete any for from the original expression, then multiply by . Therefore we can simplify (16) as follows
(20) 
Remark.
Within the PlenOctree renderer, this gradient can be computed in two rendering passes; the second pass is needed due to dependency on “future” weights and colors not seen by the ray marching process. The first pass store , then subtracting a prefix from it. The overhead is still relatively small, and auxiliary memory use is constant.
If there are multiple colors, we simply add the density derivatives over all of them. In practice, usually the network outputs and we set , so we also need to take care of setting the gradient to if .
b.4 NeRFSH Training Details
Our NeRFSH model is built upon a Jax reimplementation of NeRF [6]. In our experiments, we use a batch size of 1024 rays, each with 64 sampled points in the coarse volume and 128 additional sampled points in the fine volume. The model is optimized with the Adam optimizer [14] using a learning rate that starts at and decays exponentially to over the training process. All of our models are trained for 2M iterations under the same protocol. Training takes around 50 hours to converge for each model on a single NVIDIA V100 GPU.
b.5 PlenOctree Optimization Details
After converting the NeRFSH model into a PlenOctree, we further optimize the PlenOctree on the training set with SGD using the NeRF loss; note we no longer apply the sparsity prior here since the octree is already sparse. For NeRFsynthetic dataset, we use a constant learning rate and optimize for maximum 80 epochs. For Tanks&Temples dataset, we set the learning rate to and the maximum epochs to 40. We applied early stopping for the optimization process by monitoring the PSNR on the validation set^{5}^{5}5For Tanks&Temples dataset, we hold out 10% of the training set as validation set only for PlenOctree optimization.. On average it takes around 10 minutes to finish the PlenOctree optimization for each scene on a single NVIDIA V100 GPU. The entire optimization process is performed in float32 for stability, but afterwards we storage the PlenOctree with float16 to reduce the model size.
PSNR  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRF (original)  33.00  25.01  30.13  36.18  32.54  29.62  32.91  28.65  31.01 
NeRF  34.08  25.03  30.43  36.92  33.28  29.91  34.53  29.36  31.69 
SRN  26.96  17.18  20.73  26.81  20.85  18.09  26.85  20.60  22.26 
Neural Volumes  28.33  22.58  24.79  30.71  26.08  24.22  27.78  23.93  26.05 
NSVF  33.19  25.18  31.23  37.14  32.29  32.68  34.27  27.93  31.75 
NeRFSH  33.98  25.17  30.72  36.75  32.77  29.95  34.04  29.21  31.57 
PlenOctree from NeRFSH  33.19  25.01  30.56  36.15  32.12  29.56  33.01  28.58  31.02 
PlenOctree after finetuning  34.66  25.31  30.79  36.79  32.95  29.76  33.97  29.42  31.71 
SSIM  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRF (original)  0.967  0.925  0.964  0.974  0.961  0.949  0.980  0.856  0.947 
NeRF  0.975  0.925  0.967  0.979  0.968  0.952  0.987  0.868  0.953 
SRN  0.910  0.766  0.849  0.923  0.809  0.808  0.947  0.757  0.846 
Neural Volumes  0.916  0.873  0.910  0.944  0.880  0.888  0.946  0.784  0.893 
NSVF  0.968  0.931  0.960  0.987  0.973  0.854  0.980  0.973  0.953 
NeRFSH  0.974  0.927  0.968  0.978  0.966  0.951  0.985  0.866  0.952 
PlenOctree from NeRFSH  0.970  0.927  0.968  0.977  0.965  0.953  0.983  0.863  0.951 
PlenOctree after finetuning  0.981  0.933  0.970  0.982  0.971  0.955  0.987  0.884  0.958 
LPIPS  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRF (original)  0.046  0.091  0.044  0.121  0.050  0.063  0.028  0.206  0.081 
NeRF  0.035  0.085  0.038  0.079  0.040  0.060  0.019  0.185  0.068 
SRN  0.106  0.267  0.149  0.100  0.200  0.174  0.063  0.299  0.170 
Neural Volumes  0.109  0.214  0.162  0.109  0.175  0.130  0.107  0.276  0.160 
NSVF  0.043  0.069  0.017  0.025  0.029  0.021  0.010  0.162  0.047 
NeRFSH  0.037  0.087  0.039  0.041  0.041  0.060  0.021  0.177  0.063 
PlenOctree from NeRFSH  0.039  0.088  0.038  0.044  0.046  0.063  0.023  0.189  0.066 
PlenOctree after finetuning  0.022  0.076  0.038  0.032  0.034  0.059  0.017  0.144  0.053 
FPS  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRF (original)  0.023  0.023  0.023  0.023  0.023  0.023  0.023  0.023  0.023 
NeRF  0.045  0.045  0.045  0.045  0.045  0.045  0.045  0.045  0.045 
SRN  0.909  0.909  0.909  0.909  0.909  0.909  0.909  0.909  0.909 
Neural Volumes  3.330  3.330  3.330  3.330  3.330  3.330  3.330  3.330  3.330 
NSVF  1.044  0.735  0.597  0.660  0.633  0.517  1.972  0.362  0.815 
NeRFSH  0.051  0.051  0.051  0.051  0.051  0.051  0.051  0.051  0.051 
PlenOctree  352.4  175.9  85.6  95.5  186.8  64.2  324.9  56.0  167.7 
PSNR  

Barn  Caterpillar  Family  Ignatius  Truck  Mean  
NeRF (original)  24.05  23.75  30.29  25.43  25.36  25.78 
NeRF  27.39  25.24  32.47  27.95  26.66  27.94 
SRN  22.44  21.14  27.57  26.70  22.62  24.09 
Neural Volumes  20.82  20.71  28.72  26.54  21.71  23.70 
NSVF  27.16  26.44  33.58  27.91  26.92  28.40 
NeRFSH  27.05  25.06  32.28  28.06  26.66  27.82 
PlenOctree from NeRFSH  25.78  24.80  32.04  27.92  26.15  27.34 
PlenOctree after finetuning  26.80  25.29  32.85  28.19  26.83  27.99 
SSIM  

Barn  Caterpillar  Family  Ignatius  Truck  Mean  
NeRF (original)  0.750  0.860  0.932  0.920  0.860  0.864 
NeRF  0.842  0.892  0.951  0.940  0.896  0.904 
SRN  0.741  0.834  0.908  0.920  0.832  0.847 
Neural Volumes  0.721  0.819  0.916  0.922  0.793  0.834 
NSVF  0.823  0.900  0.954  0.930  0.895  0.900 
NeRFSH  0.838  0.891  0.949  0.940  0.895  0.902 
PlenOctree from NeRFSH  0.820  0.889  0.948  0.940  0.889  0.897 
PlenOctree after finetuning  0.856  0.907  0.962  0.948  0.914  0.917 
LPIPS  

Barn  Caterpillar  Family  Ignatius  Truck  Mean  
NeRF (original)  0.395  0.196  0.098  0.111  0.192  0.198 
NeRF  0.286  0.189  0.092  0.102  0.173  0.168 
SRN  0.448  0.278  0.134  0.128  0.266  0.251 
Neural Volumes  0.479  0.280  0.111  0.117  0.312  0.260 
NSVF  0.307  0.141  0.063  0.106  0.148  0.153 
NeRFSH  0.291  0.185  0.091  0.091  0.175  0.167 
PlenOctree from NeRFSH  0.296  0.188  0.094  0.092  0.180  0.170 
PlenOctree after finetuning  0.226  0.148  0.069  0.080  0.130  0.131 
FPS  

Barn  Caterpillar  Family  Ignatius  Truck  Mean  
NeRF (original)  0.007  0.007  0.007  0.007  0.007  0.007 
NeRF  0.013  0.013  0.013  0.013  0.013  0.013 
SRN  0.250  0.250  0.250  0.250  0.250  0.250 
Neural Volumes  1.000  1.000  1.000  1.000  1.000  1.000 
NSVF  10.74  5.415  2.625  6.062  5.886  6.146 
NeRFSH  0.015  0.015  0.015  0.015  0.015  0.015 
PlenOctree (ours)  46.94  54.00  32.33  15.67  62.16  42.22 
PSNR  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
Ours1.9G  34.66  25.31  30.79  36.79  32.95  29.76  33.97  29.42  31.71 
Ours1.4G  34.66  25.30  30.82  36.36  32.96  29.75  33.98  29.29  31.64 
Ours0.4G  32.92  24.82  30.07  36.06  31.61  28.89  32.19  29.04  30.70 
Ours0.3G  32.03  24.10  29.42  34.46  30.25  28.44  30.78  27.36  29.60 
GB  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
Ours1.9G  0.830  1.240  1.791  2.674  2.067  3.682  0.442  2.689  1.93 
Ours1.4G  0.671  0.852  0.943  1.495  1.421  3.060  0.569  1.881  1.36 
Ours0.4G  0.176  0.350  0.287  0.419  0.499  0.295  0.327  1.195  0.44 
Ours0.3G  0.131  0.183  0.286  0.403  0.340  0.503  0.159  0.381  0.30 
FPS  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
Ours1.9G  352.4  175.9  85.6  95.5  186.8  64.2  324.9  56.0  167.7 
Ours1.4G  399.7  222.2  147.3  163.5  247.9  68.0  393.8  75.4  214.7 
Ours0.4G  639.6  290.0  208.7  273.5  339.0  268.0  522.6  86.7  328.5 
Ours0.3G  767.6  424.1  203.8  271.7  443.6  189.1  796.4  181.1  409.7 
PSNR  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRFSH9  33.88  25.24  30.69  36.68  32.73  29.53  33.68  29.11  31.44 
NeRFSH16  33.98  25.17  30.72  36.75  32.77  29.95  34.04  29.21  31.57 
NeRFSH25  34.01  25.10  30.52  36.83  32.76  30.06  34.08  29.11  31.56 
NeRFSG25  34.08  25.40  31.21  36.92  32.93  29.77  34.31  29.28  31.74 
PlenOctreeSH9  34.38  25.34  30.72  36.68  32.79  29.16  33.23  29.28  31.45 
PlenOctreeSH16  34.66  25.31  30.79  36.79  32.95  29.76  33.97  29.42  31.71 
PlenOctreeSH25  34.72  25.32  30.68  36.96  32.85  29.79  33.90  29.29  31.69 
PlenOctreeSG25  34.37  25.52  31.16  36.67  32.98  29.41  33.63  29.32  31.63 
SSIM  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRFSH9  0.973  0.928  0.968  0.978  0.966  0.948  0.984  0.864  0.951 
NeRFSH16  0.974  0.927  0.968  0.978  0.966  0.951  0.985  0.866  0.952 
NeRFSH25  0.973  0.926  0.967  0.978  0.966  0.952  0.985  0.864  0.951 
NeRFSG25  0.974  0.930  0.971  0.978  0.967  0.951  0.986  0.867  0.953 
PlenOctreeSH9  0.980  0.934  0.970  0.982  0.970  0.950  0.984  0.881  0.956 
PlenOctreeSH16  0.981  0.933  0.970  0.982  0.971  0.955  0.987  0.884  0.958 
PlenOctreeSH25  0.981  0.935  0.971  0.983  0.971  0.955  0.987  0.883  0.958 
PlenOctreeSG25  0.980  0.937  0.973  0.982  0.972  0.953  0.986  0.883  0.958 
LPIPS  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
NeRFSH9  0.037  0.086  0.043  0.044  0.042  0.063  0.023  0.180  0.065 
NeRFSH16  0.037  0.087  0.039  0.041  0.041  0.060  0.021  0.177  0.063 
NeRFSH25  0.038  0.087  0.039  0.040  0.041  0.061  0.021  0.179  0.063 
NeRFSG25  0.036  0.083  0.034  0.042  0.041  0.060  0.020  0.176  0.062 
PlenOctreeSH9  0.023  0.075  0.041  0.034  0.036  0.068  0.025  0.146  0.056 
PlenOctreeSH16  0.022  0.076  0.038  0.032  0.034  0.059  0.017  0.144  0.053 
PlenOctreeSH25  0.023  0.072  0.036  0.031  0.034  0.060  0.017  0.145  0.052 
PlenOctreeSG25  0.023  0.069  0.034  0.033  0.033  0.064  0.019  0.144  0.052 
GB  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
PlenOctreeSH9  0.45  0.67  1.15  1.27  1.16  1.48  0.16  1.67  1.00 
PlenOctreeSH16  0.83  1.24  1.79  2.67  2.07  3.68  0.44  2.69  1.93 
PlenOctreeSH25  1.30  1.97  2.57  3.80  3.61  4.04  0.55  3.61  2.68 
PlenOctreeSG25  1.03  1.68  2.43  2.66  2.66  4.44  0.49  2.71  2.26 
FPS  

Chair  Drums  Ficus  Hotdog  Lego  Materials  Mic  Ship  Mean  
PlenOctreeSH9  521.1  255.6  116.7  183.0  275.1  132.3  519.4  90.6  261.7 
PlenOctreeSH16  352.4  175.9  85.6  95.5  186.8  64.2  324.9  56.0  167.7 
PlenOctreeSH25  269.2  126.7  67.0  66.4  127.1  48.9  279.2  41.3  128.2 
PlenOctreeSG25  306.6  151.9  74.1  104.3  153.3  51.0  294.2  69.6  150.6 