I'm Morgan McGuire (@CasualEffects). I've been working on computer graphics and games for 20 years at great places including NVIDIA, University of Waterloo, Williams College, Brown University, Roblox, Unity, and Activision.

See my home page for a full index of my blog posts, books, research, and projects.

Monday, April 14, 2014

Fast Terrain Rendering with Continuous Detail on a Modern GPU

Terrain rendering is challenging. It requires both high detail close to the camera and a large extent. There must be less detail per square meter in the distance to render this efficiently (and ideally, the amount of detail per pixel would be about the same), but the transitions from high to low detail should be imperceptible.

There's been a lot of research and development on terrain rendering systems. The best systems today can render worlds in which the viewer can pull back continuously from individual stones and flowers to observing entire planets from space, and when looking at the flowers can still see mountains in the distance. The underlying perception, art, and geometric issues of course remain unchanged over time, but hardware architectures and available resources change dramatically. So, terrain rendering methods that were preferred even a few years ago (e.g., ROAMgeomipmapping, projective grid) may be obsolete today. (Today's methods will likely be obsolete in a few more years). Fortunately, the last few hardware generations have moved in a direction where the currently-preferred implementations are more simple to implement than the previously-preferred ones.

In this post I describe a terrain renderer that I built with some current best practices. It draws on a lot of other blog posts and research articles that I'll mention in passing. My implementation renders in 3.4 ms at 1680x1050 resolution on NVIDIA GeForce 650M under Windows 7 (i.e., on a MacBook 2012 Pro under boot camp), and supports both forward and deferred shading. It is written in OpenGL using the G3D Innovation Engine 10.0 beta (SVN revision 4283) and I provide the full source code. I don't expect anyone to compile and run that code directly, however. I'm releasing it without support (i.e., I'm always happy to discuss algorithms, but please don't ask me for help dealing with your compiler/library/installation) to help others with their own implementations and as a case study in GPU optimization of large mesh rendering.


The underlying geometry is a static mesh submitted in a single draw call. This mesh is a version of a geo clipmap (thesis, SIGGRAPH paper, GPU Gems) that does not require explicit updates. It has high resolution near the camera, and then becomes coarser farther from the camera in the same way that textures are sampled from increasingly coarse MIP-maps farther from the camera. I generated the static mesh indices in order from the highest to the lowest resolution. This allows hierarchical and early depth culling to efficiently avoid overdraw when the camera is close to a large feature.

The vertex shader performs two major transformations on the static mesh:
  • translate the mesh, to always keep it centered around the camera
  • alter the elevations to conform the the terrain height (which is stored in a texture map)
If the terrain moved smoothly, then the individual vertices would ripple over the heightmap and not look like solid terrain (this might be acceptable for water, however). So, the vertex shader rounds off the translation along the horizontal axes to the nearest grid vertex position. This makes vertices appear to be static in world space, even though they are in fact holding still and then jumping between frames to their neighbors' previous positions.
Because the grid has varying resolution, the roundoff varies throughout the grid. I store the resolution of the mesh at each location in the vertical coordinate (which will be overwritten by the sampled elevation during vertex transformation). The vertex shader then uses the resolution to compute the appropriate rounding factor.

I tessellated the grid as regular squares that are themselves subdivided into a fan of eight triangles. While this does not give optimal conformance for a regular structure like an isometric heightfield, it is much better than slashed diagonals while also being easier than the isometric grid to stitch together where the mesh resolution drops.

Many terrain implementation divide the static mesh into quadrants or octants that are then frustum culled to avoid transforming geometry that is behind the camera. I performed some quick tests and determined that on my target GPU the performance gain for doing so was minimal (around 0.2ms). Frustum culling the terrain adds complexity to the code that I felt was unjustified for that performance gain, so I simply submit the entire mesh each time.

Distant terrain undersamples the heightfield texture. This aliasing would appear as vertical "noise" if a simple nearest-neighbor minification scheme were applied, so I compute a MIP chain for the heightfield. Default hardware MIP-mapping reduces four pixel blocks to a single pixel by averaging. This has several undesirable characteristics, including a large bias in filtering based on the location of a height texel within the image. I use a 3x3 downsampling filter and explicitly adjust texture coordinates during sampling to match the difference between this and the default MIP layout.

To eliminate discontinuities in elevation between grid resolutions, I use an explicit form of "trilinear" (MIP-map) interpolation. Vertices interpolate vertically towards the next lower-detail MIP level so that they exactly match the next resolution at the edge of a grid patch.

To conceal the vertical aliasing of the 8-bit input heightfield, at the highest detail mesh I apply quadratic interpolation instead of linear interpolation between heightfield vertices and add a small amount of value noise to break up large flat plateaus.

The mesh in this implementation is an explicit indexed triangle list using a full 32-bit precision vertex array. Reducing the mesh to 16-bit 2D horizontal vertices and 8-bit resolution flags (in a second attribute stream) would reduce the vertex bandwidth cost. It would be most elegant to generate it procedurally in the vertex shader, or using a tessellation shader. Procedural approaches would completely eliminate the input bandwidth cost. However, because attribute stream bandwidth to the vertex shader was not a significant bottleneck and the stitching logic would be much more complicated to implement under procedural mesh generation, I did not optimize this part of the process.


The ambient occlusion term, ignoring the environment map
I precompute self-shadowing (of the sun) and ambient occlusion (of the sky) on the terrain and store this information in a texture map. The RGB channels of this texture are the irradiance from the sky and the A channel is 1 where the sun is visible and 0 where it is occluded by terrain. I blur this texture to minimize the appearance of tessellation artifacts.

White terrain with the precomputed lighting applied
I apply a simple Phong shading model in the images shown on this page. Obviously, the terrain could benefit from more sophisticated shading and normal or parallax-mapping near the camera, but that is not terrain-specific, so I didn't extend it further.


The implementation supports set of five elevation zones (specified in a data file). Within each zone, there is a one material for steep surfaces (e.g., dirt) and one for flat surfaces (e.g., grass). Steep vs. flat is determined by the shading surface normal, which is fetched from the elevation texture. The surface normal rarely matches the actual mesh geometry, so triplanar mapping doesn't work directly. Instead, I hardcoded the texture rate for sloped surfaces and then optimized down the code to a handful of operations. 

All materials are stored in a single GL_TEXTURE_2D_ARRAY, which acts like a 3D texture with no filtering between array elements. This allows a fetch without a hardcoded texture binding. One alternative is a set of branches to select the appropriate sampler, which is slow (I tried that first); another alternative is packing all of the textures into a large atlas, but then one can't use hardware texture tiling. After computing texture coordinates and weights, I branch over texture fetches corresponding to low weights to reduce bandwidth. Because texture coordinate derivatives (for selecting MIP levels) are computed by finite differences between adjacent pixels, this breaks automatic MIP-level selection. So, the code computes explicit gradients outside of the branches. I intended to use DXT1 texture compression to reduced bandwidth, but the OpenGL API doesn't support copying individually-compressed textures into a 2D array. The vestiges of the setup are left commented out in the code--the next step would be to explicitly read back the compressed texture pixel transfer buffers to the CPU, merge them, and then upload them into the 2D array at once.

I also intended to modify the blending around contiguous features, however that detail wasn't needed for my immediate application in a flight simulator so I left it a using typical (and slightly muddy) lerp.

Direct texture mapping only works well in the foreground. Medium-distant surfaces would reveal the texture tiling and far-distant surfaces would always hit the lowest-resolution MIP level and exhibit a solid color. To reduce bandwidth, I precomputed the blended materials for surfaces past the highest-resolution into a single lookup table indexed as (elevation, slope). This texture I did compress with DXT1, because it accounts for a majority of the texture fetches on screen.

To increase variation, I add noise at several scales in several ways. I vary the slope and elevation used for selecting textures according to a procedural and hand-painted noise map (I'm just re-using the rock texture as the noise map in the scene shown here). This creates the appearance of medium details such as individual rocks, patches of grass, and patches of snow, and breaks up material striations on large cliffs. I also modulate the intensity of the diffuse texture by a different combination of the noise texture and functions at a very low frequency. This hides texture tiling and creates visual detail.

Finishing Touches

Raw terrain solves a technical challenge but not an artistic one. It benefits from the addition of effects that increase the sense of scale:
I added fog and depth of field in some of these screenshots, and started prototyping water but did not implement the other effects. For a PC game, I think that weighted, blended transparency would work particularly well for blending the grass, trees, and clouds based on the video result below:

Finally, below is a video with the water prototype. It is not included with the source release, but is built on G3D's open source screen-space ray tracing, refraction, and Fresnel helper functions.

Thanks to Padraic Hennessy for introducing me to many resources on this topic.

Morgan McGuire (@morgan3d) is a professor of Computer Science at Williams College, visiting professor at NVIDIA Research, and a professional game developer. He is the author of the Graphics Codex, an essential reference for computer graphics now available in iOS and Web Editions.