Saturday, November 24, 2012

I'm back!

It's been very busy at work, but I'm finally getting some time to sit down and work on Aurora again. I wrapped up my second pass at a shading engine, and as anyone coming back to a personal project after a few weeks off I'm realizing I need to spend some time on code cleanup before moving back onto more features.

The new engine now supports an arbitrary graph of shaders, which MtoAur currently parses from Maya's Hypershade and dumps into the scene description. I don't support that many nodes yet, but now that I have the framework in place that should be the easy part.

In an attempt to break the record in ray tracer cliches here's the Stanford bunny and -dragon in the Sponza model (by Marko Dabrovic):


Saturday, October 13, 2012


Things are super busy, so I haven't had much time for Aurora updates the last couple of weeks, but I spent a morning playing around with re lighting the museum scene.

At more than 20 light sources  it's taking 16k samples per pixel to get even close to a clean result while I needed no more than 4k when it was just a couple of lights and much softer lighting, so I need to work on the light sampling logic at some point. From an interactivity point of view it's taking only about a minute to get a reasonable looking preview, though, so it's not all bad. Actually, the slowest part was exporting all the geo from Maya on every update, so I added a "lazy geo" option to the render globals in the Maya translator.

2k film res, 16 bounces and 1.5 mill polys:


Saturday, September 29, 2012

Parsing shaders

I've done some more work on the Maya translator to support the new concept of shaders. 

A "Shader" in Aurora is responsible for prelighting only. They're somewhat atomic pieces of logic that given shading geometry and some parameters produce either a color or float (for now. I'll introduce manifolds, prim vars and other fun concepts later) which forms a virtual node graph that eventually feeds into the material properties of the "Material". The material interfaces with the integrators, and is responsible for handing over a singe BxDF during light transport. So far I've only written a texture2d shader with exr support and a noise3d shader with support for various transforms and spaces, but now that I have a generalized framework for shading they're pretty straight forward/quick to write, so I'll add a bunch more as I get time.

The Python and Maya material interface is monolithic/ubershader-style for now - I just piggy back on the existing Maya material node, and throw a bunch of my own attributes on there and parse these during scene translation. Internally it's all very modular, but while I figure out how to best parse/manage that elegantly it'll do for now. I suspect I can leverage the Hypershade in Maya, but I'll need to flesh out how to best handle it while parsing and also how to write custom nodes in Maya. Testing it out on a bigger scene it's behaving fine, but it's a pretty big impact on render times. It's nice to be able to control things interactively in Maya, though (although I haven't implemented any IPR style rendering yet rendering with only a single bounce is pretty fast).

The framework puts a clean line between prelighting and lighting allowing me to adaptively cache the former and stay unbiased. There's no caching mechanism implemented yet, though, so shaders are being re executed upwards of thousands of times more than needed.

A quick test on a medium sized scene:
1.5 mill polys, 9 light sources, multiple different materials, some with a procedural noise shader. 2048x1024 res, 8k samples per pixel and 16 light bounces. Render time was around 12h on my laptop, but it should drop considerably once I'm done with the shader cache. The lighting and material settings are pretty arbitrary, but I'm thinking I'll try and use this scene as a testing ground for my engine going forward by polishing it up a bit with textures and some proper lookdev and lighting. (Although the thought of UV mapping this beast isn't exactly intriguing..)


Tuesday, September 25, 2012

Bring the noise

Turbulence style world space Perlin noise implementation. I still need to add the interface to this in the python API and Maya, but for now I'm having too much fun with this shading stuff to worry about pesky details like UI control.


Monday, September 24, 2012


That's right. Gone are the days of constant colored Stanford models. Textures are the new black, green, pink and orange.

I added a shading engine responsible for feeding bxdfs their coefficients, so I finally have an appropriate environment for things like texture mapping. There are a lot of features to be written, but the framework is there now so this is where the fun begins. I kinda broke my obj parser in the process of adding prim var support, so normals are back to faceted but that's all temporary. Be prepared for a bunch of updates on the  shading side.


Thursday, September 20, 2012

No news is no news

No shiny new features this week, as I'm still fumbling around underneath the hood. Exciting times are close, though.

Here are some renders I ran as a sanity check for my kelemen material and infinite area light. 1024 by 1024, single environment light source, 8 light bounces, 1m polys, forgot to check the pixel samples. Render times were 15-20 min.


Monday, September 17, 2012

Micro facets

In this weeks episode I've been cleaning up my material code. I've changed the microfacet distribution of my specular model to the modified Beckmann distribution suggested by Kelemen et al, and made the interface a bit more generic than before, so I can extend it to a bsdf next when I add subsurface scattering - and later plug in a shading engine to support varying parameters through texturing, procedural patterns etc.

For now, here's a pink buddah. 2048x2048, 4k samples per pixel, 10 light bounces.


Monday, September 10, 2012


I'm doing some work on the back end of Aurora to make things go faster. Being that so many parts of the engine is a first pass, there's a ton of room for improvement. First up was comparing my core to that of a tested production engine. Now, it's hard - of not impossible - to get a good one-to-one comparison with any complete engine, and a lot of open source material out there is more geared towards research than production, but there are a few packages out there that has what I'm after - in particular Embree seems to be a good one.

"Embree is a collection of high-performance ray tracing kernels, developed at Intel Labs. The kernels are optimized for photo-realistic rendering on the latest Intel® processors with support for the SSE and AVX instruction sets. In addition to the ray tracing kernels, Embree provides an example photo-realistic rendering engine. Embree is designed for Monte Carlo ray tracing algorithms, where the vast majority of rays are incoherent. The specific single-ray traversal kernels in Embree provide the best performance in this scenario and they are very easy to integrate into existing applications."

And they weren't joking about the last part. With no external dependencies and a straight forward interface it only took about a morning to replace my kd tree and triangle intersection code with the BVH and intersection kernels in Embree and compare some render times. Fearing the worst, it wasn't all bad news.

The acceleration structure build times went from a few seconds for medium sized scenes (few hundred thousand polys) and a minute+ for huge ones (several mill) to less than a second for all cases I could throw at it, which I believe is mostly down to the fact that mine isn't multi threaded and has a pretty steep algorithmic complexity that doesn't do well with high tree depths. I have a couple of papers on faster KDtree build algorithms that I'm keen on trying out.

Overall render speed improved by about 2-3x for smaller scenes up to 4-5 times for bigger ones. While a lot of that comes from the lack of SSE in my own code it also speaks of either bad memory layout or room for improvement on the tree traversal side of things. While the plan is to get my own code up to speed with the SSE and compiler trickery going on in Embree, I'm more keen on getting on with other features at the moment, so I'm leaving the embree kernels in there and will come back to this later.

For now, here are some renders I ran to see what I'm looking at in terms of render times and convergence points for medium sized scenes with different material types.

1024x1024 pixels, 250k polys, 3 area light sources, 10 light bounces, 8k samples per pixel and Stanford Lucy with a lambert, a glossy Kelemen material and the new and improved Glass material for speed comparison. The lambert material is converging at pretty reasonable rate, but the caustics from glossy/mirror lobes needs a lot more samples so I definitely need some smarter algorithm for path sampling/integration.

Before heading down that path, though, I'm seeing some valleys in the CPU load that seems to correlate with my display driver blocking the main thread and making everyone wait, so I'll make sure it plays nice with others and works in parallel like everything else for what hopefully will be some more speed improvements.


Thursday, September 6, 2012

Refracting bunny

Hopefully bug free this time. I fixed the error where paths including a diffuse or glossy material and ending with Transmit->Light (or "direct caustics" I guess) where not contributing energy, and things are looking a lot better.

Next up is performance improvements, and running some contrived tests to ensure things are still unbiased and otherwise visually behaving like expected.

1024x1024, two area lights, glass material with an ior of 1.55, a whole lot of pixel samples.


Monday, September 3, 2012

Bending rays

What's a ray tracer without some good old caustics? With smooth shading normals in place the next logical step was to get specular transmission in there and refract some rays. Compared to the microfacet stuff I've been digging into for specular models implementing a perfect mirror model was reassuringly straight forward. I added a reflective mirror brdf for good measure and wrapped it all into a glass material. Tinting is currently done at the interface only - so no fancy volumetric absorption along the ray until I have a proper volume pipeline, but it does the trick for now.

EDIT2: I was using this scene as a performance test, but figured I could throw a sphere in there to show off the caustics as my other render was broken. 
2048x2048, 16 bounces, 1 mill polys, 15k samples per pixel (naive forward path tracing does not converge caustics particularly fast..):

Here's our hero with an index of refraction of 1.5:

EDIT: This one actually has a pretty hilarious - and rather obvious now that I've found it - bug. There are no caustics from direct light sources here (ie, Eye -> Diffuse -> Transmit -> Transmit -> Light paths), only from indirect lighting (Eye -> Diffuse -> Transmit -> Transmit -> Diffuse/Glossy -> Light), so only bounce from the floor is contributing to the caustics, not the light source itself. I'll leave the render up regardless, and post a correct one once I've wrapped up what I'm currently working on.


Thursday, August 30, 2012

Reel Update

I haven't done one of these in a while, so here goes: A selection of work form the last couple of years. John Carter, Wrath of the Titans and Pirates of the Caribbean were done at Moving Picture Company in London, while Trollhunter, Max Manus and various commercials were done at Storm Studios in Oslo.

Espen Nordahl vfx reel 2012 from Espen Nordahl on Vimeo.


Monday, August 27, 2012

Smooth surfaces, pt1

I wrote a quick implementation of interpolated shading normals. There's still a couple of subtler adjustments I need to handle non geometric normals a bit more gracefully in the surface integration pipeline, but the basics are there and I must admit it's nice to render something (deceptively) smooth for a change:


Sunday, August 26, 2012

The importance of importance

This week has been mostly under-the-hood stuff, so no shiny new features. I've changed the way I'm parsing object transforms as well as the way I manage space transforms during rendering. I also reduced the algorithmic complexity of importance sampling texture maps as per the 2010 Siggraph course on importance sampling, resulting in significant speedups for drawing samples from Infinite Area Lights with large environment maps compared to my previous, naive implementation. It's still pretty memory hungry - as I'm storing a pdf value per pixel - but I believe I've seen a couple of papers using various other means of drawing samples by dividing the map into regions, so I'll be taking a look at those at some point.

Render of the week:
-Just over 1 mill polys, two area lights and an infinite area light with a texture map, 1k resolution at about 4k samples per pixel

Thursday, August 16, 2012

To infinity and beyond

I finally got around to implementing image based Infinite Area Lights, which also meant introducing the concept of a world space. It's a relatively naive implementation, but does the job as a first version: Reads a latlong exr from disk, computes a pdf table and works like any other importance sampled light source, but with the added benefit of the visual complexity of an environment map.

The obligatory bunnies, various number of pixel samples (I just let each of them run for a couple of minutes on my old laptop), single infinite area light source:

And just for fun, I let this one run over night: 

360k polys, 2k samples per pixel, grey diffuse shader on everything. Model (courtesy of lighting challenges) is water tight, and the only lightsource is a huge area light outside of the windows, with a boatload of light bounces.

The noisiness even at 2k samples shows I need to decide on what to implement to help indirect-based scenes like this. The next logical step is a bi directional path tracer, which helps in a lot of indirectly lit cases, but for a scene like this one I believe I'd also need metropolis sampling in order to see real improvements, since so many of the light samples would end up just hitting the wall outside without anything to guide them through the window. There's also some aliasing around the window frames due to the massively intense light source, which comes from the fact that I have a pretty naive pixel filter at the moment (box 1x1).

Other than stress testing the image importance sampling I think the final thing on the feature list before diving back in to clean up some code and working on efficiency is shading normals.


Saturday, August 11, 2012

Multiple Importance Sampling

Having a simple Maya integration makes setting up scenes about a million times easier. I enjoy plotting in transformation matrices by hand as much as the other guy, but at some point you just want to create a sphere, attach a shader and hit render. With this new found freedom I set up a couple of test scenes to start implementing MIS, and quickly found I had a couple of subtle bugs in my light transport implementation. Cancelling out terms is all well and good until you need to sample the pdf outside of evaluating the brdf itself..

The result of this week is an implementation of a Kelemen material, with a coupling of a cook torrance spec and matte brdf, plus Multiple Importance Sampling for my forward path tracer.

The result, courtesy of the somewhat standard contrived setup (area lights at increasing size, planes with spec brdf at increasing roughness), at 64 samples per pixel:


Thursday, August 2, 2012


... is kinda cool. MtoAur 0.0.1


Friday, July 27, 2012

Faster is better

I got a new laptop, and after a day of downloading gigs of drivers and apps I was set up to see what speed improvements it would give over my outdated - albeit practical - old toy. Expecting 4 cores to blast through renders in seconds I loaded up the same cornell box scene I wrote down some stats for a month ago, hit render and waited. And waited.

Turns out multi threading isn't as easy as I naively thought. With a ton of hardware improvements what used to run at around 90% on two cores now ran at 35% on four. On top of that, a round through Instruments showed me most of that time was spent managing threads. Bummer.

I took the render loop apart, re wrote all of the threading code and a few hours later it's now running at 99%, with negligible time spent managing threads. There are way fewer threads started/killed, more work per thread, no locks and no dynamic memory allocations from inside threads.

Some new render times:
512x512 pixels, 70k polygons, 16 light bounces, 1024 samples per pixel:

On the feature end of things I added per thread stratification of samples as per

I've also implemented the specular half of Kelemens take on the Cook/Torrance BRDF and will move on to coupling it with the matte component next. Right now I'm handling multi lobe materials naively by fresnel blending BRDFs based on the geometric normal, which is hacky at best. Kelemen's approach uses a per sample fresnel to optimally importance sample the two lobes, and in addition to converging faster simply looks better/more natural.

Cook/Torrance specular brdf, same settings as above, but at 4k samples per pixel. Complete with caustics and fireflies:


Saturday, July 21, 2012

Breaking things

While struggling my way through various microfacet brdf papers, I took a "break" to implement the kd tree from pbrt. After a couple of hours of debugging (debugging anything recursive is a pain, geometric acceleration structures even more so) I got it up and running and could compare it to my uniform grid structure. For 100k polys, the kd tree was about twice as fast. Which isn't much, but nice considering it was a compact, uniform mesh which is what uniform grids does best. Next I tried duplicating the mesh and lining them up side by side, and while the uniform grid started to stagger, the kd tree wasn't much slower at all.

200k polys, 12 light bounces, 1024x1024 pixels, 1024 samples per pixel. Single area light source, with a white ground plane and walls to bounce light around.

I then went on to do what any kid with a new toy does - see how much of a beating it can take. 1.5m polys, 20 light bounces, one area light source + infinite area light outside of the windows. 1024x556 pixels, set off with 1024 samples per pixel, but stopped about half way through.

The render times are pretty high, but only about 2-3 times slower than the 200k poly ones, which is pretty nice for that much added complexity. With the infinite area light and small windows it suffers quite a bit from lack of metropolis or any other way of guiding the samples towards the windows, so it's quite noisy for the amount of samples.


Sunday, July 15, 2012

Shiny things

Being back on the visual side is a ton of fun. I’ve added a Blinn brdf, fresnel, importance sampling of materials with multiple bxdf’s and I’m working on a more generalized framework for layered physical materials that’ll lend itself to a more intuitive lookdev process. Also coming up in the near future is a couple of more brdfs, and possibly a btdf since glass seems to be all the rage for showing off ray tracers, before diving into the world of subdivision because, frankly, I’m getting tired of these hard edged triangles.

I’m still debating whether to support shading normals or not. I suppose it makes sense for flexibility, but restricting it to the geometric normal keeps things a bit more rooted in reality. I guess it’ll come down to how restrictive the lack of normal smoothing and potential bump mapping and distorted normals for layered cases like car paint etc is. I’m not sure I know of any engines that doesn’t allow distorted shading normals, which should probably raise a flag somewhere.

Some renders of the Stanford Lucy model with a square area light off screen top right and a blue’ish spherical infinite arealight. 100k polys, 1024x1024 resolution, 1024 samples per pixel, 12 light bounces.

80% white diffuse:

Blinn with a medium exponent:

Layered material using a Blinn layer over a red Lambertian diffuse.

Render times for these are still a bit nuts, so at some point I need to put together a better acceleration structure. The uniform grid solution I currently have is practical, but pretty suboptimal.


Sunday, July 8, 2012

Speed vs interactivity

Wherever rendering is going on, there is a lot of focus on speed. Where development of a render engine is going on, even more so. While total render times are a big part of what makes or breaks a production, I’d like to address something I personally find even more important - interactivity.

People are more expensive than computers. It’s been said a thousand times in the context of rendering performance, but it’s a horse worth beating. Most render time is spent creating non final images. The ratio at most places I’ve worked is actually substantial. Since rendering is expensive, it means that effort spent lowering the number of times we render bad looking frames is as well spent as that lowering the time spent computing each of those frames in the first place. One of the first things one should look at to improve this is the turnaround lookdev and lighting artists have on their work, which often times is frustratingly slow - from minutes upwards of hours from the time you make a technical change until you can see the results of that change in the form of an image. This landscape is slowly changing with the introduction of production quality ray tracers in high end CG pipelines, but it’s still not all fun and games.

Going back to writing a render engine, it can be hard to implement fast turnarounds in practice, especially with the amount of pre pixel production setup we do to decrease algorithmic complexity and increase sampling efficiency later on. I’m playing around with a few things for speeding up the startup overhead of my interactive renders in Aurora. One of beauties of unbiased rendering, is that you can combine any number of results from various unbiased techniques and still be left with an unbiased result. Which means that as soon as you’ve parsed the scene description, you could send one thread off to do all the heavy lifting that’ll make the total render time shorter - building acceleration structures, storing caches for textures and sampling etc - while another couple of threads are suboptimally starting to render with the bare number of bells and whistles needed to start getting pixels in front of the user. Unbiased sampling efficiency knobs like roulette thresholds works the same way, and can start off low and be brought up later, once the user is happy with the initial result and hasn’t cancelled the render. Combined with an adaptive pixel sampling strategy even the slowest of engines (*points at own source code*) can produce reasonable approximations of the final image in very short time compared to a full render, all without introducing bias. Depending on how far you go it could add to the total time it'll take to converge to a final image, which is why this only makes sense for interactive renders where the user is very likely to stop the render and make adjustments before it has fully converged.

Some results, rendering at 512x512 resolution, 20 light bounces, about 70k triangles and 512 samples per pixel:

From looking at the absolute numbers I still have a long way to go before I'm hitting any kind of production quality speed, but the relative times for user feedback are getting reasonable considering I'm running these on an old Macbook Air.

I also finished refactoring the engine this weekend, and was pleasantly surprised to see render times lowered by a factor of three(!). Which says more about my initial attempt than the current one, I’m afraid, being that it was all restructuring and no magic features were added. A few hours of work later I have an initial implementation of multi threading, which on my dual core processor brought the speed up by an additional 80-90%. There’s still a ton of work that needs to be done on the performance, but I’m itching to get back on the feature side so they will have to wait. Next up is a couple of more brdfs so I can get more visually interesting materials than matte surfaces, and some new infinite area light features.


Monday, July 2, 2012

A readable scene descripion

I’ve been considering what to use as the base for scene description in my engine. I want the format to be intuitive to construct both with code and by hand, easy to read, and ideally not require too much effort on my end to parse or to support in a translator. From trolling the internet, one popular choice seems to be XML, and while both simple and convenient I personally find it very clunky to read and to write out even simple structures by hand. I also considered using the rib standard, as it already has a wide range of commercial and open source translators from most major 3d softwares, as well as being a fun way of comparing performance. However, it adds a whole layer of complexity I don’t particularly want to deal with in terms of parsing and handling unsupported features, as well as being quite restrictive in layout/form.

Enter python. While technically not a data format, it fits everything else I want in a scene descriptor perfectly. Quietly slip a layer of json between it and the C++ engine, and you got yourself a pretty clean and intuitive way to talk to the renderer. As a bonus it’s widely used, and foreshadows potential future user interface endeavours. But let’s not get ahead of ourselves.

Here’s a mockup of what a python layer might look like.:

import aurora
import math

scene = aurora.core.Scene()

# Settings
options = {
"resolution"    : [512, 512],
           "driver"         : "openexr",
           "filename"       : "dragon.exr",
           "pixelsamples"   : 256,
           "pixelsampler"   : "adaptive",
           "accelerator"    : "uniformgrid",
           "minraydepth"    : 3,
           "raytermination" : "roulette",
           "raybias"        : 0.001,
           "loglevel"       : "debug",

scene.options = options

# Camera
renderCam = aurora.core.Camera(fov = math.pi/8)
renderCam.translate(278, 273, -800)


# Dragon
dragon = aurora.geometry.ObjMesh("models/stanford/dragon.obj")
dragon.scale(30, 30, 30)
dragon.translate(280, 0, 280)

material = aurora.shaders.MatteSurface(color(1,1,1))
dragon.material = material


# Cornell box

# Light
light = aurora.lights.SquareAreaLight(exposure = 5.75, color = color(1,1,1))
light.scale(60, 1, 60)
light.translate(280, 548.7, 280)



I also named my engine Aurora, for numerous reasons.


Thursday, June 28, 2012

The bare necessities

I just reached the second big milestone - the first one being rendering anything coherent - for my (still unnamed) ray tracer: A reasonable Cornell box. I modelled (although the metaphor of the analog counterpart kind of breaks when you’re hard coding vertex coordinates and topology) a scene based on the “official” Cornell standard -, to see if my engine could produce anything close to the real deal. After fixing up some fireflies (it’s easy to forget lights are geometry too) and a bug where I wasn’t accounting for the solid angle of the area light geometry when evaluating a brdf, I’m actually getting results that are pretty reasonable.

There are still one or two fireflies my render - those would be rays that are hitting the tiny gap between the area light and the ceiling, and getting radiosity values of about 99% of the light source energy, (so absurdly high) but without a light source pdf for MIS to kick in and balance it out. Other differences are: My ray tracer being color based, while the reference was rendered with a spectral renderer, so the temperature of my light source behaves somewhat differently. My surface albedo values are also slightly off compared to theirs due to eyeballing the wavelength to color conversion, so my side walls are a bit brighter and more saturated, and my white material is probably off by some factor as well. Also, mine is noisy, of course, and doesn’t have as many light bounces, because my engine is still extremely slow.

Here’s another render, with a white environment light instead of the area light inside the box.

Now that I’ve reached this point it’s time to take a small step back and look at the bigger picture. There are a couple of design choices that I don’t really like, and I ended up taking a few shortcuts towards the end to get the cornell box render out faster (as in less time to write the code, not faster rendering unfortunately), so next week I’ll be refactoring most of the source. I want an interface for setting up a scene by parsing an external scene description in some human-friendly format/language rather than needing to hard code the scene into the engine itself. I also want a better separation between the front end (parsing, primitive management, dicing and building the acceleration structure) and the back end (light transport, ray intersections) so I can have a nice and abstract structure for the first, and an optimized, faster structure for the latter without the two concepts conflicting. The code base is still relatively small, so it shouldn’t be too much work.


Sunday, June 24, 2012

It's about time

I did something incredibly stupid. Again. I’ll probably be paying for this one for a long time. I started writing a ray tracer. From scratch. In c++.

Half to close at least a fraction of the knowledge gap between myself and the guys I work with, half because I’ve been wanting to do this for a while now and half to see if I could. Much like when buying that first box set of Lost season 1 I have no idea what I’m getting myself into. In any case it was about time to raise the bar from writing “Ci = Ln.Nn” and patting myself on the back as the renderer kindly popped a picture out of a black box for me. There’s something immensely satisfying about seeing that first sphere materializing on screen after days of debugging matrix operations, writing squiggly trig problems in my notebook and cursing the compiler for not spoon feeding me answers like I’m used to. It’s sort of like that feeling you get when you realize you’ve gone from being that spoiled kid with a squeaky voice to reaching manhood, except it’s absolutely meaningless and instead of getting attention from girls you have to write about it on the internet.

Stick around for updates on my struggles through the world of light transport and monte carlo optimization.

Here’s a sphere and a bunny.