Thursday, March 26, 2015

Implementing Weighted, Blended Order-Independent Transparency

Why Transparency?

Result from the Weighted, Blended OIT method described
in this article. Everything gray in the top inset image has some
level of transmission or partial coverage transparency.
See also the new colored transmission method in my next article!

Partially transparent surfaces are important for computer graphics. Realistic materials such as fog, glass, and water as well as imaginary ones such as force-fields and magical spells appear frequently in video games and modeling programs. These all transmit light through their surfaces because of their chemical properties.

Even opaque materials can produce partially transparent surfaces within a computer graphics system. For example, when a fence is viewed from a great distance, an individual pixel may contain both fence posts and the holes between them. In that case, the "surface" of the opaque fence with holes is equivalent to a homogeneous, partly-transparent surface within the pixel. A similar situation arises at the edge of any opaque surface, where the silhouette cuts partly across a pixel. This is the classic partial coverage situation first described for graphics by Porter and Duff in 1986 and modeled with "alpha".

There are some interesting physics and technical details that I'm simplifying in this overview. To dig deeper, I recommend the discussion of the sources and relation between coverage and transmission for non-refractive transparency in the Colored Stochastic Shadow Maps paper that Eric Enderton and I wrote. I extended that discussion in the transparency section of Computer Graphics: Principles and Practice.



The Challenge

Generating real-time images of scenes with partial transparency is challenging. That's because pixels containing partly transparent surfaces have multiple surfaces contributing to the final value, and the order in which they are composited over each other affects the result. This is one reason why hair, smoke, and glass often look unrealistic in video games, especially where they come close to opaque surfaces.

One reason that transparency is challenging is that ordering surfaces is hard. There are many algorithms for ordering elements in a data structure, but they all have a cost in both time and space that is unacceptable for real-time rendering on current computer graphics hardware. If every pixel can store ten partially transparent surfaces, then a rendering system would require ten times as much memory to encode and sort those values. (I'm sure that 100 GB GPUs will exist in a few years, but they don't today, and when they do, we might not want to use all of the memory just for transparency.) It is also not possible to order the surfaces themselves because there is not necessarily any order in which multiple surfaces overlap correctly. For example, as few as three triangles can thwart the sorting approach.

Recently, a number of efficient algorithms for order-independent transparency (OIT) were introduced. These approximate the result of compositing multiple layers without the ordering constraint or unbounded intermediate storage. This can yield two benefits. The first is that the worst cases of incorrectly composited transparency can be avoided. No more bright edges on a tree in shadow or characters standing out from the fog they were hiding in. The second benefit is that multiple transparent primitives can be combined in a single draw call. That gives a significant increase in performance for scenes with lots of foliage or special effects.

All OIT methods make approximations that affect quality. A common assumption is that all partially-transparent surfaces have no refraction and do not color the objects behind them. For example, in this model "green" glass will make everything behind it look green by darkening the distant surfaces and adding green over the top. A distant red object will appear brown (dark red + green), not black as it would in the real world.

Weighted, Blended Order-Independent Transparency is a computer graphics algorithm that I developed with Louis Bavoil at NVIDIA and the support of the rendering team at Vicarious Visions. Compared to other OIT methods, it has the advantages that it uses very little memory, is very fast, and works on all major consoles and PCs released in the past few years. The primary drawbacks are that it produces less distinction between layers close together in depth and must be tuned once for the desired depth range and precision of the applications. Our I3D presentation slides explain these tradeoffs in more detail.


A glass chess set rendered with our technique.

Since publishing and presenting the research paper, I've worked with several companies to integrate our transparency method into their games and content-creation application. This article shares my current best explanation of how to implement it, as informed by that process. I'll give the description in a PC-centric way. See the original paper for notes on platforms that do not support the precisions or blending modes assumed in this guide.

Algorithm Overview

All OIT methods make the following render passes:
  1. 3D opaque surfaces to a primary framebuffer
  2. 3D transparency accumulation to an off-screen framebuffer
  3. 2D compositing transparency over the primary framebuffer
During the transparency pass the original depth buffer is maintained for testing but not written to. The compositing pass is a simple 2D image processing operation.

3D Transparency Pass

This is a 3D pass that submits transparent surfaces in any order. Bind the following two render targets in addition to the depth buffer. Test against the depth buffer, but do not write to it or clear it. The transparent pass shaders should be almost identical to the opaque pass ones. Instead of writing a final color of (r, g, b, 1), they write to each of the render targets (using the default ADD blend equation):

Render TargetFormatClearSrc BlendDst BlendWrite ("Src")
accumRGBA16F(0,0,0,0)ONEONE(r*a, g*a, b*a, a) * w
revealageR8(1,0,0,0)ZEROONE_MINUS_SRC_COLORa

The w value is a weight computed from depth. The paper and presentation describe several alternatives that are best for different kinds of content. The general-purpose one used for the images in this article is:

w = clamp(pow(min(1.0, premultipliedReflect.a * 10.0) + 0.01, 3.0) * 1e8 * pow(1.0 - gl_FragCoord.z * 0.9, 3.0), 1e-2, 3e3);

where gl_FragCoord.z is OpenGL's depth buffer value which ranges from 0 = near plane to 1 = far plane. This function downweights the color contribution of very-low coverage surfaces (e.g., that are about to fade out) and distant surfaces.

Note that the compositing uses pre-multipled color. This allows expressing emissive (glowing) values by writing the net color along each channel instead of explicitly solving the product r*a, etc. For example, a blue lightning bolt can be written to accum as (0, 10, 15, 0.1) rather than creating an artificial unmultiplied r value that must be very large to compensate for the very low coverage.

Using R16F for the revealage render target will give slightly better precision and make it easier to tune the algorithm, but a 2x savings on bandwidth and memory footprint for that texture may make it worth compressing into R8 format.

Sample GLSL shader code is below:
#version 330

out float4 _accum;
out float  _revealage;

void writePixel(vec4 premultipliedReflect, vec3 transmit, float csZ) { 
    /* Modulate the net coverage for composition by the transmission. This does not affect the color channels of the
       transparent surface because the caller's BSDF model should have already taken into account if transmission modulates
       reflection. This model doesn't handled colored transmission, so it averages the color channels. See 

          McGuire and Enderton, Colored Stochastic Shadow Maps, ACM I3D, February 2011
          http://graphics.cs.williams.edu/papers/CSSM/

       for a full explanation and derivation.*/

    premultipliedReflect.a *= 1.0 - clamp((transmit.r + transmit.g + transmit.b) * (1.0 / 3.0), 0, 1);

    /* You may need to adjust the w function if you have a very large or very small view volume; see the paper and
       presentation slides at http://jcgt.org/published/0002/02/09/ */
    // Intermediate terms to be cubed
    float a = min(1.0, premultipliedReflect.a) * 8.0 + 0.01;
    float b = -gl_FragCoord.z * 0.95 + 1.0;

    /* If your scene has a lot of content very close to the far plane,
       then include this line (one rsqrt instruction):
       b /= sqrt(1e4 * abs(csZ)); */
    float w    = clamp(a * a * a * 1e8 * b * b * b, 1e-2, 3e2);
    _accum     = premultipliedReflect * w;
    _revealage = premultipliedReflect.a;
}

void main() {
    vec4 color;
    float csZ;
    ...
    writePixel(color, csZ);
}

2D Compositing Pass

The compositing pass can blend the result over the opaque surface frame buffer (as described here), or explicitly read both buffers and write the result to a third.
Render TargetSrc BlendDst BlendWrite ("Src")
screenSRC_ALPHAONE_MINUS_SRC_ALPHA(accum.rgb / max(accum.a, epsilon), 1 - revealage)

I use epsilon = 0.00001 to avoid overflow in the division. It is easy to notice if you're overflowing or underflowing the total 16-bit precision. You'll see either fully-saturated "8-bit" ANSI-style colors (red, green, blue, yellow, cyan, magenta, white), or black dots from floating point specials (Infinity, NaN). If the computation produces floating point specials, they will typically also expand into large black squares under any postprocessed bloom or depth of field filters.

Sample GLSL shader code is below:
#version 330

/* sum(rgb * a, a) */
uniform sampler2D accumTexture;

/* prod(1 - a) */
uniform sampler2D revealageTexture;

void main() {
    int2 C = int2(gl_FragCoord.xy);
    float  revealage = texelFetch(revealageTexture, C, 0).r;
    if (revealage == 1.0) {
        // Save the blending and color texture fetch cost
        discard; 
    }

    float4 accum     = texelFetch(accumTexture, C, 0);
    // Suppress overflow
    if (isinf(maxComponent(abs(accum)))) {
        accum.rgb = float3(accum.a);
    }
    float3 averageColor = accum.rgb / max(accum.a, 0.00001);


    // dst' =  (accum.rgb / accum.a) * (1 - revealage) + dst * revealage
    gl_FragColor = float4(averageColor, 1.0 - revealage);
}


Examples

I integrated the implementation described in this article into the full open source G3D Innovation Engine renderer (version 10.1). The specific files modified to implement the technique are:


[Nicolas Rougier also contributed a Python-OpenGL implementation with nice commenting and reference images as well. I'm hosting it at http://dept.cs.williams.edu/~morgan/code/python/python-oit.zip.]

All of the following images of the San Miguel scene by Guillermo M. Leal Llaguno were rendered using G3D's implementation. To show how it integrates, these include a full screen-space and post-processing pipeline: ambient occlusion, motion blur, depth of field, FXAA, color grading, and bloom.

The inset images visualize the accum and revealage buffers. Note the combination of glass and partial coverage surfaces.





Here are examples on other kinds of content:




Note that this reference image fixes a typo from the one in the I3D presentation:
the alpha values are 0.75 (not 0.25 as originally reported!) and are given before computing the premultiplied values. So, the blue square is (0, 0, 0.75, 0.75) in pre-multiplied alpha and (0, 0, 1, 0.75) with unmultiplied color.
I'm using a different weighting function from the I3D result as well.




Morgan McGuire (@morgan3d) is a professor of Computer Science at Williams College, a researcher at NVIDIA, and a professional game developer. His most recent games are Rocket Golfing and work on the Skylanders series. He is the author of the Graphics Codex, an essential reference for computer graphics now available in iOS and Web Editions.