Since the last update, I've thoroughly refactored Imagine's image texture and file-reading infrastructure to now support Image textures at native bit depths (except for 16-bit uint support) instead of always converting them to floats as I did previously. I've basically reversed the way image loading and texture creation work - previously, image textures were always full float and the image classes were filled by the file readers - now, image classes are created at the most suitable or requested bit depth and interpretation (for single channel images) by the file readers, and suitable image texture classes are created based on these resultant images, which then do any relevant conversion in the texture lookup to return full float values for the integrators / BSDFs: for 8-bit uchar textures, a LUT is used to convert and linearise the values, so speed is not an issue, but for half float, casts have to be done, and unfortunately there's a small (but noticeable) overhead here, despite twice as many fitting into cachelines. I've tried doing any filtering / interpolation for the lookup before converting to full float, and then converting only a single half to a float afterwards, and this helps, but this isn't possible for HDR images used for environment lighting as the values are often quite high and can be near the limit of half, meaning you can't average them at that precision. But regardless, this change brings a huge reduction in memory allocation for textures, and it'd now be fairly easy to add texture paging to my texture caching infrastructure.
I've also finally made my DistributedPath integrator usable - I've been trying to duplicate how Arnold splits diffuse and glossy ray bounces for over a year now, and thanks to some diagrams in its documentation, it looked like they branch at every bounce, which was how I wrote my integrator. Doing this however resulted in a stupid amount of final rays that was ridiculously slow, and also made generating decent samples very difficult and expensive (pre-generating and re-using might have been an option). After playing with Arnold over the last six months and benchmarking it with various sample and depth settings, I'm very certain now that it only splits rays on the first bounce. So with this modification, my DistributedPath integrator is now very usable, and for scenes where geometry aliasing isn't an issue and no depth-of-field or motion blur is required, can speed up rendering to a particular noise level fairly significantly compared to pure path tracing: it's helpful when there's lots of indirect illumination, where the diffuse split multiplier can really help to reduce noise. However, you generally need to increase the number of light samples as well to compensate for the reduced number of camera samples sent out.
I'm currently in the middle of implementing a new and much more flexible shading system - Imagine's current one is pretty limited and basic, and basically just bakes down BSDF components to a container BSDF at render start, which is then always fixed for that material. This works very well (in terms of shading speed) for simple shading and non-varying mixes of materials, but makes more complex mixes and blends which are controlled by textures very difficult to code, sample and control, and also makes medium IOR transitions very complicated. I'm prototyping two different methods here to see what the overheads / limitations of each are.
In terms of future work, the task after the shading change is to seriously reduce Imagine's geometry memory footprint, in order to make it more competitive with other renderers. Thanks to Imagine's origin as a sandbox for learning OpenGL and 3D programming, its native GeometryInstance representation is very inefficient for source geometry, and the baked geometry (tessellated version of source geometry) representation is also pretty inefficient, due to OpenGL's requirement that you can only access vertex attributes uniformly, so triangle points, normals and uvs pretty much need to be gathered, leading to up to three times as many points, normals and uvs than the source geometry has. I've had a TriangleGeometryInstance for a while, which I used in order to be able to load the Lucy Stanford model on my laptop which stores pure triangles very efficiently (and doesn't do any OpenGL drawing), but I need to support polygons and Sub-ds correctly efficiently, so quite a bit of work is needed. I'd also like to look into changing the indexing size for geometry, so that for meshes with less than 65,535 vertices, I can index them with ushorts, instead of wasting space using uints - for low LOD representations of geometry, this might be quite useful.