| This is fair - there are a decent number of details omitted here. However, a lot of the details are hidden inside the proof that an IFS has a unique fixed point. At least to me, this "proof" is nice because it converts a non-obvious procedure (taking midpoints) into a "standard" procedure (the IFS construction). A sketch of the argument in the IFS goes like this: consider the set of maps {f1, f2, f3} as acting on compact subsets of R^2 by the action F -> f1(F) U f2(F) U f3(F). Since the maps fi are continuous, the image is a compact set. But one can also prove that this action is a contraction mapping on the space of compact subsets of R^2 equipped with Hausdorff metric [1], and appealing to the Banach fixed point theorem [2], (1) there is a unique compact set K fixed by this action, and (2) the images of F converge "geometrically fast" to the set K In other words, there is some constant 0 < r < 1 such that after k steps of this algorithm (not the random process), the distance from the k^th approximation to the Sierpinsky triangle is at most r^k Actual convergence rates of the corresponding random process are more complicated. One of my colleagues recently wrote a paper on convergence rates of the "chaos game" (essentially what this is doing), and one can get really sharp information on how fast the random process converges [3]. However, this uses some non-trivial covering time theory for Markov processes. It's not too complex to follow, but definitely way more detail than I can write up in this note here. [1] https://en.wikipedia.org/wiki/Hausdorff_distance [2] https://en.wikipedia.org/wiki/Banach_fixed-point_theorem [3] https://arxiv.org/abs/2007.11517 Edit: To see that the points converge to the Sierpinsky triangle (with no quantitative information) is a "bit less challenging" (i.e. within the realm of "standard" material - not necessarily easier!) - you can essentially reduce this problem to an application of the Birkhoff ergodic theorem [4]. Firstly, we can consider this process as a random map pi: {1,2,3}^{\mathbb{N}} -> R^2 which is defined by taking finite subsequences (i1, i2, ..., ik) and mapping them to the images fi1(fi2( ... (fik(x_0))), and taking the limit in k, for some fixed starting point x0. Then the "apply map fi" can be interpreted as looking at pi(sigma(i1,i2,...)) where sigma is the left-shift map sigma(i1, i2, ...) = (i2, i3, ...). But the shift map plays very nicely with the measure on the sequence space (the Bernoulli measure), in the sense that the Birkhoff ergodic theorem can be applied. This gives that for "typical" sequences of random function applications, this process will converge to the Sierpinsky triangle. Here's another way to see it: let's say I apply maps fi1, fi2, ..., fik to my starting point x0. Then if we code the "level k" approximations to the Sierpinsky triangle in the same way [triangle (i1, ..., ik) = fi1(...(fik(initial triangle)))], the image of the point x0 will lie in triangle (i1, ..., ik). But these triangles, for large k, approximate the Sierpinsky triangle. Some more argumentation is required to deal with the case when the starting point does not sit in the Sierpinsky triangle, but this isn't a problem because of the fast convergence result explained above (if you wait a long time, all the points will be "very close"). The above ergodic theory argument is just showing that every possible finite subsequence will show up for "typical" random function applications. [4] https://en.wikipedia.org/wiki/Ergodic_theory |