Calcolo numerico per la generazione di immagini fotorealistiche
Maurizio Tomasi maurizio.tomasi@unimi.it
The challenges of Releasing the Moana Island Scene (Tamstorf & Pritchett, EGSR 2019)
The scenes seen in the previous slides are formed by the combination of many simple shapes.
Keeping a list of simple shapes in memory requires a series of non-trivial measures.
Today we will discuss triangular meshes, in which the elementary shape is precisely a planar triangle. (The same discussion can be made for quadrilateral meshes, but for simplicity we will focus on triangles).
We have seen how to implement the code to calculate the intersection between a ray and a triangle in the general case where the triangle is encoded by its three vertices A, B, and C.
We did not follow the approach used for spheres and planes of choosing a “standard” shape (e.g., a triangle on the xy plane), because this would have required storing a 4×4 transformation and its inverse, for a total of 32 floating-point numbers (128 bytes at single precision).
Storing the three coordinates of a triangle requires only 3×3×4 = 36 bytes…
…but we can do better!
In a triangle mesh, the vertices are stored in an ordered list P_k, with k = 1\ldots N.
Triangles are represented by a triplet of integer indices i_1, i_2, i_3 which represents the position of the vertices P_{i_1}, P_{i_2}, P_{i_3} in the ordered list.
If 32-bit integers are used to store the indices, each triangle requires 3×4 = 12 bytes.
This is advantageous if a vertex is shared by multiple triangles, which is generally true.
Model: 44,000 vertices, 80,000 triangles.
A triangle is a planar surface, and therefore every point on its surface has the same normal \hat n.
In the case of triangle meshes, the barycentric coordinates of the triangle can be used to simulate a smooth surface: this is especially useful when the mesh is obtained from the discretization of a smooth surface.
When approximating a smooth surface, it is necessary to calculate both the vertices of the triangles and the normals at the vertices.
At the point P defined by \alpha, \beta, \gamma, the normal is assigned as
\hat n_P = \alpha \hat n_1 + \beta \hat n_2 + \gamma \hat n_3.
In the case of a mesh, there are infinitely many possible ways to create a (u, v) mapping on the surface.
In meshes, each element of the mesh is made to cover a specific portion of the entire space [0, 1] \times [0, 1].
3D modeling programs like Blender allow you to modify the (u, v) mapping of each element.
It’s a very simple format to load and used to store meshes (not just triangles).
Example (beginning of the minicooper.obj
model):
# Vertexes
v 20.851225 -39.649834 32.571609
v 20.720263 -39.659435 32.675613
v 20.589304 -39.649834 32.571609
…
# Normals
vn -0.000006 38.811405 3.583478
vn -0.000006 38.811405 3.583478
vn -0.000006 38.811405 3.583478
…
# Triangles («faces»)
f 3//3 2//2 1//1
f 4//4 3//3 1//1
f 5//5 4//4 1//1
The easiest way to view them is to use Blender, of course! Under Linux you
can also use openctm-tools
, which is more lightweight (the
command ctmviewer FILENAME
displays an OBJ file in an
interactive window).
The website of J. Burkardt contains many freely downloadable OBJ files (I took the Mini Cooper model from there).
Calculating the intersection between a mesh and a ray is not easy to implement.
The problem is that much of the time required to calculate the solution to the rendering equation is spent on ray-shape intersections.
As the number of shapes increases, the computation time necessarily increases as well.
An effective optimization is the use of boundary boxes, which rely on the concept of “axis-aligned box”. Let’s start from the latter.
The cubic shape is not very interesting in itself, but it lends itself to some optimizations relevant for polygonal meshes.
Due to its particular purpose, we will treat the case of cubes using different conventions than those made for spheres and planes:
These assumptions are referred to in the literature as axis-aligned bounding box (AAB).
A parallelepiped with edges aligned along the xyz axes can be defined by the following quantities:
Equivalently, you can store two opposite vertices of the parallelepiped, P_m (minimum values of x, y and z) and P_M (maximum values).
Let’s write the ray as r: O + t \vec d.
The calculation for the 3D case is very similar to the 2D case, so let’s consider the latter:
Let F_i be a generic point on a face of the box perpendicular to the i-th direction (six planes in total), which will have coordinates
F_0 = (f_0^{\text{min}/\text{max}}, \cdot, \cdot), \quad F_1 = (\cdot, f_1^{\text{min}/\text{max}}, \cdot), \quad F_2 = (\cdot, \cdot, f_2^{\text{min}/\text{max}}).
In general, there are two ray-face intersection along the i-th coordinate, if we consider whole planes instead of the actual faces:
O + t_i \vec d = F^{\text{min}/\text{max}}_i\quad\Rightarrow\quad t_i^{\text{min}/\text{max}} = \frac{f_i^{\text{min}/\text{max}} - O_i}{d_i}.
Each direction produces two intersections, so in total there are six potential intersections (one for each face of the cube).
But not all intersections are correct: they are calculated for the entire infinite plane on which the face of the cube lies.
It is therefore necessary to verify for each value of t if the corresponding point P actually lies on one of the faces of the cube.
Last week we implemented the World
type, which
contains a list of objects
When calculating an intersection with an object,
World.ray_intersection
must iterate over all the
Shape
s in World
If you increase the number of Shape
s tenfold, the
time required to produce an image also increases tenfold…
…but even solving the rendering equation in simple cases can take hours!
This image contains three geometric shapes (two planes and a sphere), and was calculated in ~156 seconds.
With our implementation of World
, the time required
to compute an image is roughly proportional to the number of ray-shape
intersections.
But realistic scenes contain many shapes!
Moana island scene is a scene composed of ~15 billion basic shapes. The rendering time would be on the order of 25,000 years!
Now that we have described axis-aligned boxes, let’s move to the concept of axis-aligned bounding boxes, which provides a simple but effective optimization.
Axis-aligned bounding boxes (AABBs) are AABBs that delimit the volume occupied by objects.
They are widely used in computer graphics as an optimization mechanism.
The principle is as follows:
AABBs are useful only for scenes made up of many objects. For simple scenes, they can slow down the rendering.
They are however very useful with triangle meshes and complex CSG objects.
If you want to support AABBs in your ray-tracer, you should add
an aabb
member to the Shape
type to be used
inside Shape.rayIntersection
:
AABBs are perfect for applying to meshes. (In this case they obviously do not apply to the individual elements, but to the mesh as a whole).
When loading a mesh, its AABB can be calculated by calculating the minimum and maximum values of the coordinates of all vertices.
In the case of the Oceania tree, the intersection between a ray and the 18 million elements would only occur for those rays actually oriented towards that tree.
However, it is not always sufficient to use AABBs for meshes to be efficient.
Scenes are often almost completely occupied by a few complex objects, and in this case AABBs do not provide any advantage (as in the previous image).
However, there are several sophisticated optimizations that transform the problem of looking for ray-triangle hits from O(N) to O(\log_2 N). The most used ones employ KD-trees and Bounding Volume Hierarchies. They are both explained in the book by Pharr, Jakob & Humphreys; we will quickly explain the former.
KD-trees are a specific application of a broader family of algorithms called Binary Space Partitions (BSP).
BSP algorithms are used for searching in spatio-temporal domains; in our case, the problem is to find the potential triangle in the mesh that intersects a given ray.
BSP methods are iterative, and at each iteration, they halve the volume of the space to be searched.
Let’s recall the bisection method used to find the zeros of a function, which is explained in the TNDS course (II year of the Bachelor’s degree).
Given a continuous function f: [a, b] \rightarrow \mathbb{R} such that f(a) \cdot f(b) \leq 0, the intermediate value theorem guarantees that \exists x \in [a, b]: f(x) = 0.
The bisection method consists of dividing the interval [a, b] into two parts [a, c] and [c, b], with c = (a + b)/2, and applying the method to the sub-interval where the intermediate value theorem still holds.
It can be shown that to achieve a precision \epsilon in estimating the zero, N = \log_2 ((b - a)/\epsilon) steps are needed, i.e., O(\log N): it’s very efficient!
If the zero x_0 is known with precision \pm 1, just 20 steps are sufficient to reach a precision of \pm 2^{-20} = \pm 10^{-6}.
BSP methods enclose all the shapes of a world within a bounding box, then divide it into two regions, partitioning the shapes into one half or the other (or both, if they lie along the division).
This subdivision is repeated recursively up to a certain depth: ideally, until the bounding boxes contain a certain (small) number of objects.
KD-trees are a type of BSP where the bounding boxes are the well-known AABBs.
KD-trees are explained and implemented in section 4.4 of Physically Based Rendering (Pharr, Jakob, Humphreys, 3rd ed.)
Figure 4.14 from Physically Based Rendering (Pharr, Jakob, Humphreys, 3rd ed.)
This is the procedure to build a KD-tree in memory:
This procedure needs to be done only once, before solving the rendering equation.
A KD-tree can be stored in a tree structure built when loading the mesh.
To represent the splits, a KdTreeSplit
type can be
defined:
The generic node of the tree is represented like this:
To determine if a ray intersects a mesh optimized with a KD-tree, follow this procedure:
For the Oceania tree, in the case of a perfectly balanced KD-tree (50%–50%), fewer than 25 comparisons are needed to determine the intersection with a ray.
Figure 4.17 from Physically Based Rendering (Pharr, Jakob, Humphreys, 3rd ed.)
To build a KD-tree, some questions need to be answered:
Answering these questions is not trivial, but finding an efficient solution is important!
To build an efficient KD-tree, the computational cost of the tree needs to be evaluated, which is given by
C(t) = C_\text{trav} + P_L \cdot C(L) + P_R \cdot C(R),
where
These assumptions can be made:
A robust algorithm tries various tree splits, calculating the cost of each, and chooses the split that leads to the lowest cost.
The speed benefits can range from a factor of 10 to a factor of 100 compared to a KD-tree built with simple assumptions.
Last week you fixed your first bug, which concerned the incorrect orientation of the images saved by your code.
In general, a bug is a problem in the program that makes it work differently than expected.
It is very important to have a scientific approach to bug management! In the next slides I will give you some general guidelines.
Zeller’s excellent book Why Programs Fail: A Guide to Systematic Debugging explains the discovery of a bug as the combination of three factors:
The bug lies in the initial fault, but if there is no infection or no failure, it is difficult to notice it!
In your second-year, you implemented code that calculates the value of
\int_0^\pi \sin x\,\mathrm{d}x
using Simpson’s rule:
\int_a^b f(x)\,\mathrm{d}x \approx \frac{h}3 \left[f(a) + 4\sum_{i=1}^{n/2} f\bigl(x_{2i-1}\bigr) + 2\sum_{i=1}^{n/2 - 1} f\bigl(x_{2i}\bigr) + f(b)\right].
Often the implementation is wrong even though the result is correct (\int = 2)!
\int_a^b f(x)\,\mathrm{d}x \approx \frac{h}3 \left[f(a)+ {\color{red}{4}}\sum_{i=1}^{n/2} f\bigl(x_{2i-1}\bigr) + {\color{red}{2}}\sum_{i=1}^{n/2 - 1} f\bigl(x_{2i}\bigr) + f(b)\right].
Often students swap the 4 with the 2.
This leads to an infection: the value of the expression in square brackets is wrong.
This leads to a failure: the result of the integral is wrong.
Of all the cases, this is the simplest: it is immediately obvious what the problem is!
\int_a^b f(x)\,\mathrm{d}x \approx \frac{h}3 \left[f(a)+ 4\sum_{i=1}^{\color{red}{n/2}} f\bigl(x_{2i-1}\bigr) + 2\sum_{i=1}^{\color{red}{n/2 - 1}} f\bigl(x_{2i}\bigr) + f(b)\right].
Sometimes students terminate one of the two summations too early (they forget the last term) or too late (they add an extra term).
In the case of \int_0^\pi \sin x\,\mathrm{d}x, the last term of the summation is for x \approx \pi, so it is very small: there is an infection, but if you print only a few significant digits on the screen, the result may be rounded to the correct value, and therefore there is no failure.
\int_a^b f(x)\,\mathrm{d}x \approx \frac{h}3 \left[{\color{red}{f(a)}}+ 4\sum_{i=1}^{n/2} f\bigl(x_{2i-1}\bigr) + 2\sum_{i=1}^{n/2 - 1} f\bigl(x_{2i}\bigr) + {\color{red}{f(b)}}\right].
Sometimes students forget to add f(a) and/or f(b), or they multiply them by 2 or 4.
However, in the case of \int_0^\pi \sin x\,\mathrm{d}x, the value of the expression in square brackets is still correct because f(0) = f(\pi) = 0.
In this case there is a fault but there is no infection nor a failure: this is the most difficult case to identify!
It is very common for the same fault to lead to different failures: this depends on the input data, the type of action being performed with the program, etc.
It is therefore very common for users to open different issues that are however caused by the same fault.
Example: a crash in Julia is caused by the combination of two previously reported issues, which at first glance did not seem related.
GitHub allows you to assign the Duplicated label to
issues:
When you observe a failure and want to open an issue, you must indicate:
This is because the developer must be able to reproduce the failure in order to identify the fault that caused it.
If a user reports an issue to you without some of these things being clear, do not hesitate to ask for more details.
GitHub allows you to configure an issue template.
Type | Examples |
---|---|
Symbolic debuggers | GDB, LLDB |
Memory checkers | Memcheck |
Dynamic analysis | Valgrind |
Record-and-replay | rr, UDB |
Fuzzying debuggers | AFL++, libFuzzer |