The triple product wavelet integral IBL proposed by Ng et al.  enables changing lighting as well as viewing directions by separately precomputing BRDF and visibility functions. In their implementation, BRDF, visibility, and lighting functions are all sampled in the same global frame. Because BRDFs are usually defined in local frame, Ng et al. precompute BRDFs for a set of orientations (defined in global frame), that is, each single material BRDF has multiple copies for different orientations. During runtime, when shading a vertex, the surface normal is used for choosing and interpolating BRDFs with closest orientations. This approach requires huge memory for storing precomputed BRDFs set.
Wang et al.  proposes a computational approach for efficient wavelet rotation. First, they choose octahedral mapping for parameterizing the spherical domain, because it has good uniformity and reduces wavelet rotation to translation in 2D.
Second, they precompute a wavelet rotation matrix R per orientation (So they get a bunch of rotation matrices). A wavelet rotation matrix is very sparse because of the compact local support of wavelet bases. Quantization could further reduce the number of non-zero elements. If we have 32×32 predefined normal orientations, we got 1024 rotation matrices.
These wavelet rotation matrices are independent of BRDF or lighting or visibility, so they could be computed, stored in disk, and loaded into memory when the application starts. And we only need one BRDF copy for each material.
In runtime, the light function is dynamically sampled and wavelet-transformed, and then rotated by those precomputed wavelet rotation matrices to obtain different rotated versions of lighting functions (or more specifically, wavelet coefficients).
These rotated versions of lighting wavelet coefficients will then be used for performing shading in each vertex, that is, doing dot-products with BRDFs defined in local frame.
Wang et al.  doesn’t take into account visibility, so they only deal with dot-products when doing shading. They perform shading entirely on CPU; SSE instructions is enough to achieve near-interactive frame rate.
In order to incorporate visibility and perform wavelet triple product integral, we have to precompute the visibility function on each vertex, that is, sampling visibility in spatial domain and wavelet-transforming it. With OpenGL 4.x features, we might be able to shrink the precomputation time to acceptable levels.
Back in mid 2000 we had to make 6 drawing API calls per vertex position to generate visibility cubemaps for each vertex; today we could use multi-draw indirect to greatly reduce the number of CPU making expensive API calls.
Second, layered rendering with geometry shader make it possible render all 6 cubemap faces in one draw.
Third, bindless textures reduces the overhead of repeatedly binding/unbinding those visibility textures when doing wavelet transform on GPU.
1. Ren Ng, Ravi Ramamoorthi, Pat Hanrahan, Triple Product Wavelet Integrals for All-Frequency Relighting, Siggraph 2004
2. Rui Wang, Ren Ng, David Luebke, Greg Humphreys, Efficient Wavelet Rotation for Environment Map Rendering, Eurographics Symposium on Rendering (2006)