Monday, September 10, 2012

Embree

I'm doing some work on the back end of Aurora to make things go faster. Being that so many parts of the engine is a first pass, there's a ton of room for improvement. First up was comparing my core to that of a tested production engine. Now, it's hard - of not impossible - to get a good one-to-one comparison with any complete engine, and a lot of open source material out there is more geared towards research than production, but there are a few packages out there that has what I'm after - in particular Embree seems to be a good one.

"Embree is a collection of high-performance ray tracing kernels, developed at Intel Labs. The kernels are optimized for photo-realistic rendering on the latest Intel® processors with support for the SSE and AVX instruction sets. In addition to the ray tracing kernels, Embree provides an example photo-realistic rendering engine. Embree is designed for Monte Carlo ray tracing algorithms, where the vast majority of rays are incoherent. The specific single-ray traversal kernels in Embree provide the best performance in this scenario and they are very easy to integrate into existing applications."

And they weren't joking about the last part. With no external dependencies and a straight forward interface it only took about a morning to replace my kd tree and triangle intersection code with the BVH and intersection kernels in Embree and compare some render times. Fearing the worst, it wasn't all bad news.

The acceleration structure build times went from a few seconds for medium sized scenes (few hundred thousand polys) and a minute+ for huge ones (several mill) to less than a second for all cases I could throw at it, which I believe is mostly down to the fact that mine isn't multi threaded and has a pretty steep algorithmic complexity that doesn't do well with high tree depths. I have a couple of papers on faster KDtree build algorithms that I'm keen on trying out.

Overall render speed improved by about 2-3x for smaller scenes up to 4-5 times for bigger ones. While a lot of that comes from the lack of SSE in my own code it also speaks of either bad memory layout or room for improvement on the tree traversal side of things. While the plan is to get my own code up to speed with the SSE and compiler trickery going on in Embree, I'm more keen on getting on with other features at the moment, so I'm leaving the embree kernels in there and will come back to this later.

For now, here are some renders I ran to see what I'm looking at in terms of render times and convergence points for medium sized scenes with different material types.

1024x1024 pixels, 250k polys, 3 area light sources, 10 light bounces, 8k samples per pixel and Stanford Lucy with a lambert, a glossy Kelemen material and the new and improved Glass material for speed comparison. The lambert material is converging at pretty reasonable rate, but the caustics from glossy/mirror lobes needs a lot more samples so I definitely need some smarter algorithm for path sampling/integration.

Before heading down that path, though, I'm seeing some valleys in the CPU load that seems to correlate with my display driver blocking the main thread and making everyone wait, so I'll make sure it plays nice with others and works in parallel like everything else for what hopefully will be some more speed improvements.






-Espen

No comments:

Post a Comment