I did some prototyping of two different shading implementations which allow vastly more flexible shading than Imagine's previous baked-BSDF approach. The two different methods I tested really only varied in how the memory for the dynamic BSDF components was allocated and used - both methods built the BSDF components after each geometry intersection (for non-shadow rays - in my final implementation it's also performed when Transparent shadows are enabled), either from constant values or textures. The first test method was using a memory arena to allocate the samples, and the second was allocating the memory on the stack within the integrator loops.
Fairly comprehensive benchmarking - using a worst-case scenario: allocating lots of different BSDFs all driven from image textures, and all controlled by quite convoluted and expensive branching logic - showed that between the two methods, in terms of speed, there was practically no difference. However, there was (as I expected) an overhead to doing this compared to the baked BSDF approach - generally around 4-9% overhead total render time. The extreme end of this I'm putting down to image texture evaluations (all images in the tests were memory-resident, so with texture paging and displacement the overhead could be even higher), and the lower end is probably the additional branching for controlling the BSDF creation now being called for every ray bounce instead of just once. Because of this, and the fact that there didn't seem to be any overhead in just having the stack based dynamic BSDF components in the integrator loops if I didn't use them, I decided to use this second approach which allowed me to still use the baked BSDF approach if the material definition was simple enough to allow it. This allows great flexibility but at the cost of some code complexity, but I think that's a worthwhile trade-off.
So now any float/Col3f material parameter can be driven by textures, and a decision is made per-material at pre-render time whether complex shading is needed on a per-material basis. If not, the material will pre-bake the BSDF as previously, and within the integrators, this baked BSDF is returned by the material shade() function and used.
If complex shading is needed, then the material can make use of the pointer to the stack-allocated BSDF memory which is passed in to the shade() function, allocate BSDF components as required using this memory and then return this pointer to the integrators. The base infrastructure is now in place for node-based shading networks - the GUI side of things for that is the main work required to complete this.
Based on this new functionality, I implemented a MixMaterial ability to mix or binary-switch materials based on a texture.
I also added UDIM texture atlas support with lazy on-demand reading of textures based on the UV coordinates.
Monday, 26 May 2014
Thursday, 1 May 2014
Since the last update, I've thoroughly refactored Imagine's image texture and file-reading infrastructure to now support Image textures at native bit depths (except for 16-bit uint support) instead of always converting them to floats as I did previously. I've basically reversed the way image loading and texture creation work - previously, image textures were always full float and the image classes were filled by the file readers - now, image classes are created at the most suitable or requested bit depth and interpretation (for single channel images) by the file readers, and suitable image texture classes are created based on these resultant images, which then do any relevant conversion in the texture lookup to return full float values for the integrators / BSDFs: for 8-bit uchar textures, a LUT is used to convert and linearise the values, so speed is not an issue, but for half float, casts have to be done, and unfortunately there's a small (but noticeable) overhead here, despite twice as many fitting into cachelines. I've tried doing any filtering / interpolation for the lookup before converting to full float, and then converting only a single half to a float afterwards, and this helps, but this isn't possible for HDR images used for environment lighting as the values are often quite high and can be near the limit of half, meaning you can't average them at that precision. But regardless, this change brings a huge reduction in memory allocation for textures, and it'd now be fairly easy to add texture paging to my texture caching infrastructure.
I've also finally made my DistributedPath integrator usable - I've been trying to duplicate how Arnold splits diffuse and glossy ray bounces for over a year now, and thanks to some diagrams in its documentation, it looked like they branch at every bounce, which was how I wrote my integrator. Doing this however resulted in a stupid amount of final rays that was ridiculously slow, and also made generating decent samples very difficult and expensive (pre-generating and re-using might have been an option). After playing with Arnold over the last six months and benchmarking it with various sample and depth settings, I'm very certain now that it only splits rays on the first bounce. So with this modification, my DistributedPath integrator is now very usable, and for scenes where geometry aliasing isn't an issue and no depth-of-field or motion blur is required, can speed up rendering to a particular noise level fairly significantly compared to pure path tracing: it's helpful when there's lots of indirect illumination, where the diffuse split multiplier can really help to reduce noise. However, you generally need to increase the number of light samples as well to compensate for the reduced number of camera samples sent out.
I'm currently in the middle of implementing a new and much more flexible shading system - Imagine's current one is pretty limited and basic, and basically just bakes down BSDF components to a container BSDF at render start, which is then always fixed for that material. This works very well (in terms of shading speed) for simple shading and non-varying mixes of materials, but makes more complex mixes and blends which are controlled by textures very difficult to code, sample and control, and also makes medium IOR transitions very complicated. I'm prototyping two different methods here to see what the overheads / limitations of each are.
In terms of future work, the task after the shading change is to seriously reduce Imagine's geometry memory footprint, in order to make it more competitive with other renderers. Thanks to Imagine's origin as a sandbox for learning OpenGL and 3D programming, its native GeometryInstance representation is very inefficient for source geometry, and the baked geometry (tessellated version of source geometry) representation is also pretty inefficient, due to OpenGL's requirement that you can only access vertex attributes uniformly, so triangle points, normals and uvs pretty much need to be gathered, leading to up to three times as many points, normals and uvs than the source geometry has. I've had a TriangleGeometryInstance for a while, which I used in order to be able to load the Lucy Stanford model on my laptop which stores pure triangles very efficiently (and doesn't do any OpenGL drawing), but I need to support polygons and Sub-ds correctly efficiently, so quite a bit of work is needed. I'd also like to look into changing the indexing size for geometry, so that for meshes with less than 65,535 vertices, I can index them with ushorts, instead of wasting space using uints - for low LOD representations of geometry, this might be quite useful.