Hacker News new | ask | show | jobs
by arutar 1632 days ago
Here's a quick proof as to how it works. Suppose the triangle has endpoints (0,0), (1,0), and (1/2, sqrt(3)/2). Given a point (x,y), the three transformations then become

f1(x,y) = (x/2, y/2)

f2(x,y) = (x/2+1/2, y/2)

f3(x,y) = (x/2 + 1/4, y/2 + sqrt(3)/4)

which are precisely the maps in the Iterated Function System [1] which generate the Sierpinsky triangle!

It's a very nice observation that the three maps have a concise representation in terms of taking the midpoint with the given point and the vertices of the triangle.

[1] https://en.wikipedia.org/wiki/Iterated_function_system

3 comments

I'm sorry, but that gives me "draw the rest of the fucking owl" vibes. You're basically restating the algorithm's definition -- it's the rest of the proof that contains the interesting steps.

Edit: In my view, to show that this draws the Sierpinsky triangle, one would need to show that (1) we only draw points that are in the Sierpinski triangle and (2) we draw all the points of the Sierpinski triangle.

(1) is clearly false (we start with a random point), but the claim is of course only that we draw an “approximation”. What that means exactly would need to be defined. I assume a rigorous argument would involve 2D probability densities. Given that, (2) isn’t obvious to me as well, what’s the argument that all the parts of the Sierpinski shape correspond to areas with high probability density?

I really don't see how any proof is necessary.

Taking midpoint between a point and tringle vertex is a transformation that scales everything by 1/2 with this vertex as pivot.

By choosing one of such three transformations randomly you are scaling the whole triangle into three smaller copies of itself that it cosists of.

If you start with a point belonging to Sierpinski trinagle, you are adding more and more points belonging to that triangle.

Fun thing is that you can use the same algorithm with different number and kind of transformations to get other fractals with such random walk. For exaple a fractal tree or fern leaf or Sierpinski carpet.

Another interesting thing is you can start with a point not belonging to a fractal and it pretty quickly coverges. It's because fractals are attractors.

But you don’t start with a point belonging to the triangle, that’s the whole idea!
There are probably many better explanations out there, but I'll try one:

It's important t note that if you pick any point p in the Sierpinsky triangle and you calculate f1(p), f2(p) and f3(p) you get tree points that are also in the Sierpinsky triangle. [f1, f2, f3 as defined in the top comment]

---

You pick any point in the plane, let's call it q, and you draw it in the screen.

You pick any point in the Sierpinsky triangle, let's call it p, p is a secret point.

The important number is the distance between p and q. You could have pick a better point in the Sierpinsky triangle but anyone you have choose is fine.

Now you select one of the functions f1, f2, f3, let's call it f. And you calculate p'=f(p) and q'=f(q) (the same functions for both).

Now you have two new points p' and q'. p' is a again a point in the Sierpinsky triangle as I said at the beginning. q' is a point that you draw in the screen.

Again, the important number is the distance between p' and q'. What happens if you compare the distance between p' and q' with the initial distance between p an q? The new distance is one half of the initial distance, so q' is closer to p'. You could have pick a better initial point p in the Sierpinsky triangle or pick now a better point p*in the Sierpinsky triangle but don't worry, p' is fine.

And now you repeat the process with cut&paste...

You select one of the functions f1, f2, f3, let's call it f. It may be the same functions as before or another one. You calculate p''=f(p') and q''=f(q') (the same functions for both).

Now you have two new points p'' and q''. p'' is a again a point in the Sierpinsky triangle as I said at the beginning. q'' is a point that you draw in the screen.

Again, the important number is the distance between p'' and q''. What happens if you compare the distance between p'' and q'' with the previos distance between p' and q' and the initial distance between p an q? The new distance is one half of the previous distance and one quarter of the initial distance, so q'' is closer to p''. You could have pick a better initial point p in the Sierpinsky triangle or pick now a better point p**in the Sierpinsky triangle but don't worry, p'' is fine.

And now you repeat the process with cut&paste...

...

After enough halvings, the distance between p'''''...'''' and q'''''...'''' is smaller that any level of precision you select initially, like 1E-15. If you need more precision because you are using a high zoom level, just add more steps and more halvings.

Now you have a sequence of points q, q', q'', q''', q'''', ... that are drawn in the screen and p, p', p'', p''', p'''', ... that are only in your imagination.

Except a few of the initial points, all the other points q'''''...'''' are very close to the Sierpinsky triangle. One trick is to not draw the initial 1000 points, so you don't get some points in weird locations. You should prove that 1000 is enough, or use a bigger number or just cross your fingers and hope nobody notices them.

Note that this method actually doesn't draw the Sierpinsky triangle. It just draw a cloud of points that very close to the Sierpinsky triangle. (Unless you are very lucky, and the initial point q is also in the Sierpinsky triangle or something like that.)

Yeah, it's easier if you don't start with a random point in a triangle (it doesn't even have to be in a triangle) but with one of the vertices.

To understand why you can start with any point you'd have to understand why fractals are attractors and that's some serious math I presume, with epsilons and quantifiers.

Tell us some more serious math! You’re doing great so far and that’s where the “huh?” ends up answered.
The point here is that you'll almost certainly end up inside the Sierpinski triangle within a logarithmic number of steps.

Let's do the 1-dimensional version, the Cantor set.

  init_x = (anything)
  while 1:
    draw(x)
    if flip_coin() == HEADS:
       x = (x+1)/2
    else:
       x = x/2
Supposing we start with a large negative number N, and our coin is secretly TAILS on both sides, our sequence will be

  N, N/2, N/4, N/8, N/16...
So we'll rapidly converge on the point 0, but never quite make it into the interval [0, 1] because N is negative. But, if we draw our point with a diameter of say, 1/128, it will rapidly become visually indistinguishable from zero.

If the coin is fair, then something really cool happens -- if we hit a HEADS and x was already in the interval [-1, 0] then our result will be in the interval [0, 1/2] -- which is inside the interval [0, 1] where the problem is easier to think about. Approaching the problem in this way requires a bunch of case analysis, but the gist is that an attractor will quickly suck any point into the fractal -- if you get really unlucky, it will only get very close to the fractal in question.

I'd be happy to tell you if I actually knew it. I just know it exists. Sorry if I gave the wrong impression.

If I were to look for it I think I'd search for proof that Iterated Function System has a fixed point or attractor https://en.m.wikipedia.org/wiki/Iterated_function_system

the algorithm ensures that the point is approximately in the triangle. using the midpoint with one of the vertices, excludes the triangle in the middle...
Here's a sketch of a proof using tools from Markov chain monte carlo. Writing on mobile so forgive any typos.

Suppose you start by drawing a point uniformly at random from the triangle. We'll call this step 0. With probability 1 this point is not in the sierpinski triangle, but if we consider finite approximations of the sierpinsky triangle it is in the level 0 sierpinski triangle (see this random illustration I found on Google [1]). In particular, since the Level 0 sierpinski triangle is just the whole triangle, the initial point is uniformly distributed in the level 0 sierpinski triangle.

This is a base case. Then we can do an induction step: If we have a point which is uniformly distributed in the level k sierpinski triangle, applying the transformation (pick one vertex of the triangle at random and move halfway towards it) results in a point which is uniformly distributed in the level k+1 sierpinski triangle. This is because each corner of the level k+1 sierpinski triangle is a scaled copy of the level k sierpinski triangle.

Putting the base case and induction step together, we deduce that when we repeatedly apply the transformation of picking a vertex and moving halfway towards it, at each step k we have a point which is uniformly distributed in the level k sierpinski triangle.

This is enough to show that if we start with a sample of N points drawn uniformly from the triangle, after k steps we'll have N points drawn uniformly from the level k sierpinski triangle (i.e. an arbitrarily good approximation of the sierpinski triangle, if we make k large enough). It's not quite enough to show that if we start with a single point and record the location after each step we'll end up with something that looks like the sierpinski triangle. For that we need to establish ergodicity of the Markov chain. The details are technical but not difficult to show; basically the Markov chain needs to not get caught in fixed-length loops (aperiodicity) and never get stuck unable to reach part of the triangle (irreducibility). Both these conditions are satisfied, so the Markov chain is ergodic and starting from a single point will approximate a sierpinski triangle.

I've glossed over a bunch of details but hopefully that shows the basic idea.

[1] https://www.researchgate.net/profile/Ali-Bashir-4/publicatio...

The rest of the proof is in the Wikipedia page linked by the GP. It even has a description of the method used by the OP:

> The most common algorithm to compute IFS fractals is called the "chaos game". It consists of picking a random point in the plane, then iteratively applying one of the functions chosen at random from the function system to transform the point to get a next point. An alternative algorithm is to generate each possible sequence of functions up to a given maximum length, and then to plot the results of applying each of these sequences of functions to an initial point or shape.

Are we looking at a different version of the page? I see descriptions and definitions, but no proof.
You're reaching a classic physics/mathematical limit that arrises from asking "why" questions. To understand this concept further you need to understand concepts from chaos theory, specifically attractors, but it may be difficult to explore further if you are not then satisfied.
This is fair - there are a decent number of details omitted here. However, a lot of the details are hidden inside the proof that an IFS has a unique fixed point. At least to me, this "proof" is nice because it converts a non-obvious procedure (taking midpoints) into a "standard" procedure (the IFS construction).

A sketch of the argument in the IFS goes like this: consider the set of maps {f1, f2, f3} as acting on compact subsets of R^2 by the action F -> f1(F) U f2(F) U f3(F). Since the maps fi are continuous, the image is a compact set. But one can also prove that this action is a contraction mapping on the space of compact subsets of R^2 equipped with Hausdorff metric [1], and appealing to the Banach fixed point theorem [2],

(1) there is a unique compact set K fixed by this action, and

(2) the images of F converge "geometrically fast" to the set K

In other words, there is some constant 0 < r < 1 such that after k steps of this algorithm (not the random process), the distance from the k^th approximation to the Sierpinsky triangle is at most r^k

Actual convergence rates of the corresponding random process are more complicated. One of my colleagues recently wrote a paper on convergence rates of the "chaos game" (essentially what this is doing), and one can get really sharp information on how fast the random process converges [3]. However, this uses some non-trivial covering time theory for Markov processes. It's not too complex to follow, but definitely way more detail than I can write up in this note here.

[1] https://en.wikipedia.org/wiki/Hausdorff_distance

[2] https://en.wikipedia.org/wiki/Banach_fixed-point_theorem

[3] https://arxiv.org/abs/2007.11517

Edit: To see that the points converge to the Sierpinsky triangle (with no quantitative information) is a "bit less challenging" (i.e. within the realm of "standard" material - not necessarily easier!) - you can essentially reduce this problem to an application of the Birkhoff ergodic theorem [4]. Firstly, we can consider this process as a random map

pi: {1,2,3}^{\mathbb{N}} -> R^2

which is defined by taking finite subsequences (i1, i2, ..., ik) and mapping them to the images fi1(fi2( ... (fik(x_0))), and taking the limit in k, for some fixed starting point x0.

Then the "apply map fi" can be interpreted as looking at pi(sigma(i1,i2,...)) where sigma is the left-shift map sigma(i1, i2, ...) = (i2, i3, ...). But the shift map plays very nicely with the measure on the sequence space (the Bernoulli measure), in the sense that the Birkhoff ergodic theorem can be applied. This gives that for "typical" sequences of random function applications, this process will converge to the Sierpinsky triangle.

Here's another way to see it: let's say I apply maps fi1, fi2, ..., fik to my starting point x0. Then if we code the "level k" approximations to the Sierpinsky triangle in the same way [triangle (i1, ..., ik) = fi1(...(fik(initial triangle)))], the image of the point x0 will lie in triangle (i1, ..., ik). But these triangles, for large k, approximate the Sierpinsky triangle. Some more argumentation is required to deal with the case when the starting point does not sit in the Sierpinsky triangle, but this isn't a problem because of the fast convergence result explained above (if you wait a long time, all the points will be "very close"). The above ergodic theory argument is just showing that every possible finite subsequence will show up for "typical" random function applications.

[4] https://en.wikipedia.org/wiki/Ergodic_theory

Michael Barnsley has two books that discuss this in greater detail.

https://maths.anu.edu.au/people/academics/michael-barnsley

and

http://superfractals.com/wpfiles/

Like it. The theory of the iterated function systems beautifully utilizes the Banach fixed-point theorem.