Per-vertex visibility cubemaps generation

Image-based lighting (IBL) using triple-product integral technique factors lighting, BRDF, and visibility into three independently sampled functions. The rendering equation is evaluated per-vertex, that is, doing triple-product integral of those three functions on each vertex position. Usually one Lighting function, represented in the form of an environment map, is sampled at run time and applied to all shading vertices, and by utilizing precomputed wavelet-rotation matrices (see my previous post, or the original paper), we only need one copy of BRDF function defined in local frame for each material. Only visibility have to be sampled individually on each vertex position. Graphics hardware could be used to help us generate these visibility “cubemap” on each vertex position. Like the traditional old way of generating a reflection environment map in a certain position, to generate the visibility cubemap for a vertex we could position the camera in that vertex’s position, and render the scene 6 times facing 6 different directions. (+x,-x,+y,-y,+z,-z) We can improve the efficiency of the creation of these per-vertex visibility cubemaps, by utilizing the newer GPU and OpenGL capabilities. First, we can use layered rendering to render the 6 cubemap faces in one pass. By attaching a cubemap texture to the FBO, we can, in geometry shader, emit primitives to all cubemap faces. Here is a simple geometry shader code:


void main()
{
for( int i = 0; i < 6; ++i )
{
 gl_Layer = i;
 gl_Position = u_mvps[i] * gl_in[0].gl_Position;
 EmitVertex();

 gl_Position = u_mvps[i] * gl_in[1].gl_Position;
 EmitVertex();

 gl_Position = u_mvps[i] * gl_in[2].gl_Position;
 EmitVertex();

 EndPrimitive();
}
}

Built-in output variable gl_Layer controls which layer (or cubemap face, if the render target is a cubemap texture) the primitive will be sent to. Furthermore, we could potentially increase the performance by using instancing geometry shader. Instancing makes the geometry shader execute multiple time on the same input primitive. Here is a sample code:


layout ( triangles, invocations=6 ) in; 
layout ( triangle_strip, max_vertices = 3 ) out;

...

void main()
{
 gl_Layer = gl_InvocationID;


 gl_Position = u_mvps[gl_InvocationID] * gl_in[0].gl_Position;
 EmitVertex();

gl_Position = u_mvps[gl_InvocationID] * gl_in[1].gl_Position;
 EmitVertex();

gl_Position = u_mvps[gl_InvocationID] * gl_in[2].gl_Position;
 EmitVertex();

EndPrimitive();

}

You can see in the input layout qualifier we specify the GS to be invoked 6 times for each primitive. The invocation id (0~5) could be retrieved via gl_InvocationID. I say “potentially increase the performance” because if the number of input primitives is far larger than that of shader cores, instancing seems not that helpful. I need to do a timing test see if there is difference. Finally, for per-vertex visibility cubemaps generation, since we know all the rendering parameters beforehand, we can store them in a shader storage buffer or texture buffer, and use instancing rendering to eliminate the redundant API calls. For a scene containing 150000 vertices, using naive rendering loops requires 150000 draw API calls. Ideally, just one API draw call is ever needed when instanced rendering is used. However, because we need to retain each vertex’s visibility map, we need to switch FBO or FBO attachments within the rendering iterations. Allocating 150000 textures is also not a good idea. Because 6x32x32 resolution is usually enough for a vertex’s visibility cubemap, we can somewhat alleviate this problem by using large-dimension, like 6x8192x8192, textures and groups a number of visibility fields into one large texture. And we do instancing rendering with these large textures serving as render target. The number of instancing draw API calls is that of the large textures.

In my test scene, which has 175000 vertices, the visibility cubemap generation only takes a few seconds in a machine featuring a nVidia GTX670;  I haven’t precisely measured it though.

Visibility field for a vertex

Visibility field for a vertex

1 thought on “Per-vertex visibility cubemaps generation

  1. Small performance hint: Attaching multiple textures to an FBO and switching between them is faster then switching FBOs. Of course, you can only attach 8, but at least it reduces the overhead.

    Using one of the new APIs, d3d12, mantle, nv-cmdlists, you would be able to submit all your drawcalls using a single function-call from your code, which would likely make this a lot faster (possibly even realtime?), assuming you are cpu-bound… which would make this a very interesting technique indeed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s