Friday, July 27, 2012

Faster is better

I got a new laptop, and after a day of downloading gigs of drivers and apps I was set up to see what speed improvements it would give over my outdated - albeit practical - old toy. Expecting 4 cores to blast through renders in seconds I loaded up the same cornell box scene I wrote down some stats for a month ago, hit render and waited. And waited.

Turns out multi threading isn't as easy as I naively thought. With a ton of hardware improvements what used to run at around 90% on two cores now ran at 35% on four. On top of that, a round through Instruments showed me most of that time was spent managing threads. Bummer.

I took the render loop apart, re wrote all of the threading code and a few hours later it's now running at 99%, with negligible time spent managing threads. There are way fewer threads started/killed, more work per thread, no locks and no dynamic memory allocations from inside threads.

Some new render times:
512x512 pixels, 70k polygons, 16 light bounces, 1024 samples per pixel:

On the feature end of things I added per thread stratification of samples as per

I've also implemented the specular half of Kelemens take on the Cook/Torrance BRDF and will move on to coupling it with the matte component next. Right now I'm handling multi lobe materials naively by fresnel blending BRDFs based on the geometric normal, which is hacky at best. Kelemen's approach uses a per sample fresnel to optimally importance sample the two lobes, and in addition to converging faster simply looks better/more natural.

Cook/Torrance specular brdf, same settings as above, but at 4k samples per pixel. Complete with caustics and fireflies:


Saturday, July 21, 2012

Breaking things

While struggling my way through various microfacet brdf papers, I took a "break" to implement the kd tree from pbrt. After a couple of hours of debugging (debugging anything recursive is a pain, geometric acceleration structures even more so) I got it up and running and could compare it to my uniform grid structure. For 100k polys, the kd tree was about twice as fast. Which isn't much, but nice considering it was a compact, uniform mesh which is what uniform grids does best. Next I tried duplicating the mesh and lining them up side by side, and while the uniform grid started to stagger, the kd tree wasn't much slower at all.

200k polys, 12 light bounces, 1024x1024 pixels, 1024 samples per pixel. Single area light source, with a white ground plane and walls to bounce light around.

I then went on to do what any kid with a new toy does - see how much of a beating it can take. 1.5m polys, 20 light bounces, one area light source + infinite area light outside of the windows. 1024x556 pixels, set off with 1024 samples per pixel, but stopped about half way through.

The render times are pretty high, but only about 2-3 times slower than the 200k poly ones, which is pretty nice for that much added complexity. With the infinite area light and small windows it suffers quite a bit from lack of metropolis or any other way of guiding the samples towards the windows, so it's quite noisy for the amount of samples.


Sunday, July 15, 2012

Shiny things

Being back on the visual side is a ton of fun. I’ve added a Blinn brdf, fresnel, importance sampling of materials with multiple bxdf’s and I’m working on a more generalized framework for layered physical materials that’ll lend itself to a more intuitive lookdev process. Also coming up in the near future is a couple of more brdfs, and possibly a btdf since glass seems to be all the rage for showing off ray tracers, before diving into the world of subdivision because, frankly, I’m getting tired of these hard edged triangles.

I’m still debating whether to support shading normals or not. I suppose it makes sense for flexibility, but restricting it to the geometric normal keeps things a bit more rooted in reality. I guess it’ll come down to how restrictive the lack of normal smoothing and potential bump mapping and distorted normals for layered cases like car paint etc is. I’m not sure I know of any engines that doesn’t allow distorted shading normals, which should probably raise a flag somewhere.

Some renders of the Stanford Lucy model with a square area light off screen top right and a blue’ish spherical infinite arealight. 100k polys, 1024x1024 resolution, 1024 samples per pixel, 12 light bounces.

80% white diffuse:

Blinn with a medium exponent:

Layered material using a Blinn layer over a red Lambertian diffuse.

Render times for these are still a bit nuts, so at some point I need to put together a better acceleration structure. The uniform grid solution I currently have is practical, but pretty suboptimal.


Sunday, July 8, 2012

Speed vs interactivity

Wherever rendering is going on, there is a lot of focus on speed. Where development of a render engine is going on, even more so. While total render times are a big part of what makes or breaks a production, I’d like to address something I personally find even more important - interactivity.

People are more expensive than computers. It’s been said a thousand times in the context of rendering performance, but it’s a horse worth beating. Most render time is spent creating non final images. The ratio at most places I’ve worked is actually substantial. Since rendering is expensive, it means that effort spent lowering the number of times we render bad looking frames is as well spent as that lowering the time spent computing each of those frames in the first place. One of the first things one should look at to improve this is the turnaround lookdev and lighting artists have on their work, which often times is frustratingly slow - from minutes upwards of hours from the time you make a technical change until you can see the results of that change in the form of an image. This landscape is slowly changing with the introduction of production quality ray tracers in high end CG pipelines, but it’s still not all fun and games.

Going back to writing a render engine, it can be hard to implement fast turnarounds in practice, especially with the amount of pre pixel production setup we do to decrease algorithmic complexity and increase sampling efficiency later on. I’m playing around with a few things for speeding up the startup overhead of my interactive renders in Aurora. One of beauties of unbiased rendering, is that you can combine any number of results from various unbiased techniques and still be left with an unbiased result. Which means that as soon as you’ve parsed the scene description, you could send one thread off to do all the heavy lifting that’ll make the total render time shorter - building acceleration structures, storing caches for textures and sampling etc - while another couple of threads are suboptimally starting to render with the bare number of bells and whistles needed to start getting pixels in front of the user. Unbiased sampling efficiency knobs like roulette thresholds works the same way, and can start off low and be brought up later, once the user is happy with the initial result and hasn’t cancelled the render. Combined with an adaptive pixel sampling strategy even the slowest of engines (*points at own source code*) can produce reasonable approximations of the final image in very short time compared to a full render, all without introducing bias. Depending on how far you go it could add to the total time it'll take to converge to a final image, which is why this only makes sense for interactive renders where the user is very likely to stop the render and make adjustments before it has fully converged.

Some results, rendering at 512x512 resolution, 20 light bounces, about 70k triangles and 512 samples per pixel:

From looking at the absolute numbers I still have a long way to go before I'm hitting any kind of production quality speed, but the relative times for user feedback are getting reasonable considering I'm running these on an old Macbook Air.

I also finished refactoring the engine this weekend, and was pleasantly surprised to see render times lowered by a factor of three(!). Which says more about my initial attempt than the current one, I’m afraid, being that it was all restructuring and no magic features were added. A few hours of work later I have an initial implementation of multi threading, which on my dual core processor brought the speed up by an additional 80-90%. There’s still a ton of work that needs to be done on the performance, but I’m itching to get back on the feature side so they will have to wait. Next up is a couple of more brdfs so I can get more visually interesting materials than matte surfaces, and some new infinite area light features.


Monday, July 2, 2012

A readable scene descripion

I’ve been considering what to use as the base for scene description in my engine. I want the format to be intuitive to construct both with code and by hand, easy to read, and ideally not require too much effort on my end to parse or to support in a translator. From trolling the internet, one popular choice seems to be XML, and while both simple and convenient I personally find it very clunky to read and to write out even simple structures by hand. I also considered using the rib standard, as it already has a wide range of commercial and open source translators from most major 3d softwares, as well as being a fun way of comparing performance. However, it adds a whole layer of complexity I don’t particularly want to deal with in terms of parsing and handling unsupported features, as well as being quite restrictive in layout/form.

Enter python. While technically not a data format, it fits everything else I want in a scene descriptor perfectly. Quietly slip a layer of json between it and the C++ engine, and you got yourself a pretty clean and intuitive way to talk to the renderer. As a bonus it’s widely used, and foreshadows potential future user interface endeavours. But let’s not get ahead of ourselves.

Here’s a mockup of what a python layer might look like.:

import aurora
import math

scene = aurora.core.Scene()

# Settings
options = {
"resolution"    : [512, 512],
           "driver"         : "openexr",
           "filename"       : "dragon.exr",
           "pixelsamples"   : 256,
           "pixelsampler"   : "adaptive",
           "accelerator"    : "uniformgrid",
           "minraydepth"    : 3,
           "raytermination" : "roulette",
           "raybias"        : 0.001,
           "loglevel"       : "debug",

scene.options = options

# Camera
renderCam = aurora.core.Camera(fov = math.pi/8)
renderCam.translate(278, 273, -800)


# Dragon
dragon = aurora.geometry.ObjMesh("models/stanford/dragon.obj")
dragon.scale(30, 30, 30)
dragon.translate(280, 0, 280)

material = aurora.shaders.MatteSurface(color(1,1,1))
dragon.material = material


# Cornell box

# Light
light = aurora.lights.SquareAreaLight(exposure = 5.75, color = color(1,1,1))
light.scale(60, 1, 60)
light.translate(280, 548.7, 280)



I also named my engine Aurora, for numerous reasons.