Friday, July 27, 2012

Faster is better

I got a new laptop, and after a day of downloading gigs of drivers and apps I was set up to see what speed improvements it would give over my outdated - albeit practical - old toy. Expecting 4 cores to blast through renders in seconds I loaded up the same cornell box scene I wrote down some stats for a month ago, hit render and waited. And waited.

Turns out multi threading isn't as easy as I naively thought. With a ton of hardware improvements what used to run at around 90% on two cores now ran at 35% on four. On top of that, a round through Instruments showed me most of that time was spent managing threads. Bummer.

I took the render loop apart, re wrote all of the threading code and a few hours later it's now running at 99%, with negligible time spent managing threads. There are way fewer threads started/killed, more work per thread, no locks and no dynamic memory allocations from inside threads.

Some new render times:
512x512 pixels, 70k polygons, 16 light bounces, 1024 samples per pixel:



On the feature end of things I added per thread stratification of samples as per http://graphics.berkeley.edu/papers/Ramamoorthi-ATO-2012-02/index.html

I've also implemented the specular half of Kelemens take on the Cook/Torrance BRDF and will move on to coupling it with the matte component next. Right now I'm handling multi lobe materials naively by fresnel blending BRDFs based on the geometric normal, which is hacky at best. Kelemen's approach uses a per sample fresnel to optimally importance sample the two lobes, and in addition to converging faster simply looks better/more natural.

Cook/Torrance specular brdf, same settings as above, but at 4k samples per pixel. Complete with caustics and fireflies:


-Espen

No comments:

Post a Comment