The problem with a z-prepass is that it requires submitting the entire scene an extra time to the GPU. This brings the z-prepass into question as a performance optimization. Specifically:
Is doubling the cost of transformation, tessellation, and rasterizer setup less than the cost of overshading?
I performed a quick test by simply removing the z-prepass from the system in forward+ mode. This means that the first rendering pass is a G-buffer pass that writes to multiple render targets simultaneously, foregoing depth-only z-prepass. ATCS and the Minecraft model had relatively high depth complexity (and thus stand to benefit the most from a z-prepass), but they also have a lot of materials and thus many draw calls. Sponza and the smoke test have low depth complexity. I measured performance by looking at the full-frame rendering time ("1/fps") with vsync off.I tested on Windows 7 64-bit with a low-end GeForce 650M (in a 2012 MacBook Pro) for four scenes at 720p + 64-pixel guard band: Crytek Sponza, the ATCS Quake3 map from Tremulous, a Minecraft model, and the G3D smoke test that uses all G3D model subclasses and many draw calls. All had ambient occlusion off and few skinned characters, to test the worst case for disabling z-prepass.
For each of these scenes, there was no significant performance difference with or without a z-prepass. Sponza and ATCS rendered in 16ms (61 fps). Minecraft was 38 ms with a prepass and 37 ms without it. The smoketest took 26 ms in both cases.
This left me with the conclusion that the complexity of the z-prepass in my system was not justified--the minor amount of overshading it reduced seemed nearly equalled by the increase in rendering time. In other words, the z-prepass may be irrelevant in modern rendering systems that submit many draw calls for well-sorted objects, and is potentially harmful as tessellation (and thus rasterizer setup) and skinning workloads increase. For a renderer that doesn't perform particuarly good front-to-back sorting (because it uses large meshes, for example), has a lot of alpha-testing, or in which the front half of the pipeline is relatively lightweight, z-prepass may still be important.
One caveat is that G3D actually uses two guard bands: a 64-pixel for depth (used to provide samples for SAO) and a 16-pixel one for color (for screen-space refraction, motion blur, and depth of field), as shown below:
When rendering the forward+ G-buffer without a prepass, it would be wasteful to compute per-pixel properties other than depth in the purple "trim band." So, I made the G-buffer shaders return immediately (but not discard) if the fragment coordinate is inside of that band. This substantially reduces bandwidth in those regions and slightly reduces the total amount of computation. The test results in this post were made before this optimization, but I expect it to now make running without the z-prepass actually faster.
Morgan McGuire is a professor of Computer Science at Williams College and a professional game developer. He is the author of The Graphics Codex, an essential reference for computer graphics that runs on iPhone, iPad, and iPod Touch.