| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jarvic 4409 days ago

I haven't seen anything specific about how they are doing it, but I can give you a few guesses as to what common approaches would be.

The most straightforward way to model the shapes themselves is to put some points on each letter in corresponding locations on each letter. Every sample of each letter would need the same number of points, and each point would need to be in roughly the same spatial location on the letter (so, for a letter S, the first point in every sample would be at one end, the last at the other, and the ones between at some kind of identifying landmarks).

Given these collections of sampled curves, the simplest thing to do is to just compute the Euclidean mean, treating each of them as a point in a high-dimensional space. You could go farther and do PCA, giving you not only a mean but modes of variation. Using this you could examine the most common ways in which each letter varies in the population, which can be an interesting thing to study.

What I've described is building a statistical shape model from a boundary point distribution model (PDM) of an object. This is typically done in the context where you want to fit your model to new instances of the object, not just for finding a mean, and are known as Active Shape Models. You can check it out on wikipedia (en.wikipedia.org/wiki/Active_shape_model). Here is one (among many possible choices) describing generally how this is done:

http://tinyurl.com/nxjwygn

There are other techniques for representing the shapes or computing statistics that can produce better models, both from theoretical and practical points of view, but this is generally the most common and would be my first choice if I was going to do something like this. Of course, they could be doing something much simpler like a simple averaging of the images (assuming the letters are all in roughly the same place) as well.