Pretty good reflections computed by our screen space ray tracer in ripply water |
Games march 3D rays across the height field defined by a depth buffer to create very approximate screen-space reflections. You can see this technique in action in Crysis 3 and Just Cause 2. When the point where the 3D ray hits the 3D surface in camera space is discovered, the game projects that point back to 2D and uses the color of the pixel for the color of the reflection. This is obviously limited to reflections of objects that are on the screen, so most games fall back to an environment cube map when no hit point is found in the depth buffer because the ray went off screen or behind an object (see Bart Wronski's detailed explanation of failure cases).
Although it is primarily used for reflections today, this technique can do much more. Many computer graphics algorithms are based on casting rays. For example, we experimented with screen space rays for approximate reflection, refraction, glossy reflection, ambient occlusion, and even global illumination (much more efficient approximations are available for the last two).
Different effects computed by our screen-space ray tracer. The rightmost two are not real-time. |
The Problem with 3D Marching
In current games, the ray march is usually limited to a small number of iterations to ensure real-time performance. Due to perspective, marching in regular steps along a 3D ray doesn't generally give 2D steps across the screen. This means that the limited number of samples are distributed poorly--especially in the case of water reflections like the ones shown in the dragon picture above. For that kind of camera and reflective surface, the ray will skip away from the surface too quickly, missing near-by objects. It will then bunch up the remainder of the samples on a few pixels, limiting how far away reflecting objects can be and wasting computation sampling the same pixels repeatedly.The diagram below visualizes the problem. The over-sampled pixels are colored red for the 3D ray march. The under-sampled pixels are obvious gaps.
This limitation of the ray march technique is why the water reflections of distant mountains fade out in my terrain prototype video:
The 2D Marching Solution
Fast, exact thin-line and conservative 2D rasterization algorithms have been known for over 20 years. We built our ray caster by trivially extending the "Digital Differential Analyzer" (DDA) line algorithm with perspective-correct interpolation (i.e., how hardware rasterization handles interpolators) and then optimizing the implementation for the particular concerns of GPU programming, such as minimizing divergence, memory bandwidth, and peak register count. The optimized implementation is below. Our upcoming paper explains the optimizations in more detail includes extension to deep G-buffers. It also includes code for operating directly on a depth buffer instead of requiring a linear-z buffer, although we don't advise it for performance. This kind of approach has already proven reliable in production: Assassin's Creed 4 (see page 80) and Killzone 4 (page 90) presentations both describe it.// By Morgan McGuire and Michael Mara at Williams College 2014 // Released as open source under the BSD 2-Clause License // http://opensource.org/licenses/BSD-2-Clause #define point2 vec2 #define point3 vec3 float distanceSquared(vec2 a, vec2 b) { a -= b; return dot(a, a); } // Returns true if the ray hit something bool traceScreenSpaceRay1( // Camera-space ray origin, which must be within the view volume point3 csOrig, // Unit length camera-space ray direction vec3 csDir, // A projection matrix that maps to pixel coordinates (not [-1, +1] // normalized device coordinates) mat4x4 proj, // The camera-space Z buffer (all negative values) sampler2D csZBuffer, // Dimensions of csZBuffer vec2 csZBufferSize, // Camera space thickness to ascribe to each pixel in the depth buffer float zThickness, // (Negative number) float nearPlaneZ, // Step in horizontal or vertical pixels between samples. This is a float // because integer math is slow on GPUs, but should be set to an integer >= 1 float stride, // Number between 0 and 1 for how far to bump the ray in stride units // to conceal banding artifacts float jitter, // Maximum number of iterations. Higher gives better images but may be slow const float maxSteps, // Maximum camera-space distance to trace before returning a miss float maxDistance, // Pixel coordinates of the first intersection with the scene out point2 hitPixel, // Camera space location of the ray hit out point3 hitPoint) { // Clip to the near plane float rayLength = ((csOrig.z + csDir.z * maxDistance) > nearPlaneZ) ? (nearPlaneZ - csOrig.z) / csDir.z : maxDistance; point3 csEndPoint = csOrig + csDir * rayLength; // Project into homogeneous clip space vec4 H0 = proj * vec4(csOrig, 1.0); vec4 H1 = proj * vec4(csEndPoint, 1.0); float k0 = 1.0 / H0.w, k1 = 1.0 / H1.w; // The interpolated homogeneous version of the camera-space points point3 Q0 = csOrig * k0, Q1 = csEndPoint * k1; // Screen-space endpoints point2 P0 = H0.xy * k0, P1 = H1.xy * k1; // If the line is degenerate, make it cover at least one pixel // to avoid handling zero-pixel extent as a special case later P1 += vec2((distanceSquared(P0, P1) < 0.0001) ? 0.01 : 0.0); vec2 delta = P1 - P0; // Permute so that the primary iteration is in x to collapse // all quadrant-specific DDA cases later bool permute = false; if (abs(delta.x) < abs(delta.y)) { // This is a more-vertical line permute = true; delta = delta.yx; P0 = P0.yx; P1 = P1.yx; } float stepDir = sign(delta.x); float invdx = stepDir / delta.x; // Track the derivatives of Q and k vec3 dQ = (Q1 - Q0) * invdx; float dk = (k1 - k0) * invdx; vec2 dP = vec2(stepDir, delta.y * invdx); // Scale derivatives by the desired pixel stride and then // offset the starting values by the jitter fraction dP *= stride; dQ *= stride; dk *= stride; P0 += dP * jitter; Q0 += dQ * jitter; k0 += dk * jitter; // Slide P from P0 to P1, (now-homogeneous) Q from Q0 to Q1, k from k0 to k1 point3 Q = Q0; // Adjust end condition for iteration direction float end = P1.x * stepDir; float k = k0, stepCount = 0.0, prevZMaxEstimate = csOrig.z; float rayZMin = prevZMaxEstimate, rayZMax = prevZMaxEstimate; float sceneZMax = rayZMax + 100; for (point2 P = P0; ((P.x * stepDir) <= end) && (stepCount < maxSteps) && ((rayZMax < sceneZMax - zThickness) || (rayZMin > sceneZMax)) && (sceneZMax != 0); P += dP, Q.z += dQ.z, k += dk, ++stepCount) { rayZMin = prevZMaxEstimate; rayZMax = (dQ.z * 0.5 + Q.z) / (dk * 0.5 + k); prevZMaxEstimate = rayZMax; if (rayZMin > rayZMax) { float t = rayZMin; rayZMin = rayZMax; rayZMax = t; } hitPixel = permute ? P.yx : P; // You may need hitPixel.y = csZBufferSize.y - hitPixel.y; here if your vertical axis // is different than ours in screen space sceneZMax = texelFetch(csZBuffer, int2(hitPixel), 0); } // Advance Q based on the number of steps Q.xy += dQ.xy * stepCount; hitPoint = Q * (1.0 / k); return (rayZMax >= sceneZMax - zThickness) && (rayZMin < sceneZMax); }
All of the images on this page used the new method, except for the terrain video that I used to demonstrate the old ray marching method.
Reflecting teapots in water |
You can eliminate the division from the inner loop by performing the depth comparison in homogeneous clip space. The problem with doing so is that it is impractical to apply a depth interval to the depth buffer, since any constant interval in homogeneous clip space grows hyperbolically away from the near plane. We believe that developers who use this approach simply make the interval infinite, i.e., assume that the depth buffer is a continuous sheet. The drawback to that approach is that it makes objects that should have finite z extent appear infinite. An example is below.
A lot of game engines are moving to temporal filtering to reduce the cost of computing effects. This kind of filtering is hard to apply to reflection and refraction because they are view dependent. Mike and I briefly experimented with the filters in Unreal Engine 4 for screen-space reflections and did not like the quality, so we don't yet have a recommendation on how to do this well. However, we implemented some nice temporal filters for our deep G-buffer paper and include the source on that page to help others explore the problem.
Morgan McGuire (@morgan3d) is a professor of Computer Science at Williams College, visiting professor at NVIDIA Research, and a professional game developer. His most recent games are Rocket Golfing and work on the Skylanders series. He is the author of the Graphics Codex, an essential reference for computer graphics now available in iOS and Web Editions.