Hacker News new | ask | show | jobs
by mr_luc 1706 days ago
So if I'm understanding this right -- if I have a stream of numbers coming in forming a squiggly line, and I have a bucket of these wavelet shapes, I can pick up wavelets and stretch and squeeze and resize them and overlay them on my line, and ... use them to characterize that squiggly line? Working as feature recognition and also serving as a way to compress it? So instead of 20k data points, I have a sequence like 'mexican hat, mexican hat', and maybe elements in the sequence are different sizes and overlap?

(a) if my intuition is super wrong I'd love for HN to correct it, heh. (b) long shot, but anyone have links to code? It's HN after all -- maybe some commenters in this thread are aware of cool/idiomatic/simple/etc uses of wavelet code in open source software.

(I know of a bunch of cool uses they've been put to, from JPEGs to the spacex fluid dynamics presentation about using wavelet compression on gpu, but I've personally never used them as a tool for anything, and it'd be fun to learn about them with code!)

4 comments

Wavelets are used for pattern recognition in many iris recognition systems. First, the position of the iris in the input image is determined, then any eyelash and eyelid occlusions are removed and the iris is extracted by converting it to a polar coordinated image. The resulting image signal is convolved with wavelets of different shapes and sizes. The resulting signal is encoded using phase demodulation to produce an IrisCode. Two iris codes can be checked for a match by computing their hamming distance. [1] is the paper which describes the original system invented by Daugman. [2] is an open source implementation of that method.

[1] https://www.robots.ox.ac.uk/~az/lectures/est/iris.pdf [2] http://iris.giannaros.org/

That paper by Daugman was the basis for my undergrad thesis, so when I saw the title, that's what my mind immediately jumped to! Lo and behold, I open the comments and find a link to the paper. Not only that, but it's still stored on my old professor's website (~az = Andrew Zisserman)!
This is quality content. Thank you
Yes.

A mathematician explained it to me as an alternative to a fourier transform: instead of describing the function as a sum of sine waves, its a sum of mexican hats (or whatever basis function). And it turns out that’s a simpler representation in the case of function with sharp discontinuities. It’s also an alternative to a Taylor series, replacing sums of derivatives with the sum of the scaled basis function. Seemed like a pretty elegant explanation to me.

yes, this is how i understand them.

it has been a long time, so i'm probably going to say something really stupid... but it's fun to try and hopefully someone will correct me if i'm wrong,

my understanding is that one place that they're useful is in terms of time-frequency uncertainty that occurs with regular fourier transforms. for example, if you're doing spectral analysis and looking for low frequencies you need a wider analysis window (a full cycle has a long period for low frequencies), but that wide analysis window now covers a lot of time, so if you pick up a low frequency component, you don't know where it is in that big wide window. wavelets instead fit these families of arbitrary basis functions that can be more compact, so you can use a narrower window and therefore have less time uncertainty for the things you find within it.

...most applications sort of go from there, where you want to warp the equivalent of the x-axis in the frequency domain in various ways and perhaps be able to either use a narrower analysis window for better time resolution or say, a smaller number of basis functions for reconstruction in lossy compression where some bands are less important than others and resolution in those bands can be discarded.

I think about it in terms of trading in time-frequency space. The total amount of time-frequency uncertainty is conserved: it's just like the Heisenberg uncertainty principle from physics, and is in fact deeply connected to it.

A classic Fourier spectrogram makes one choice about time uncertainty vs. frequency uncertainty and uses it for all frequencies. You can think of it as dividing time-frequency space into little squares that work okay for most quantities of interest.

Wavelets alter that tradeoff. At low frequencies, we often care about small differences in frequency (e.g., 4 vs 5 Hz), while the precise location of the peaks (e.g., 10 vs 10.1 s) matters less because the signal is changing slowly anyway. The situation is reversed at high frequencies: 1 kHz vs 1.001 kHz is often physically irrelevant, but the timing matters because they're so short). We therefore divide the time-frequency space up so that the windows are long in time, but narrow in frequency at low frequencies, square in the middle, and short in time but wide in frequency at higher ones.

On top of that, wavelets recognize that sine waves are a mathematically convenient basis, but may not reflect your data, so you can also swap in a "mother wavelet" that better matches whatever phenomena you're studying.

One of the critically useful properties of wavelets, as I understand it, is that they are localized in both the original domain and the frequency domain.

A FT on an audio file, for example, will tell you how much of each frequency is present in that file. Each component is localized in frequency. A wavelet, however, is localized in both frequency and time (due to this decaying “hat” shape). When you scan this wavelet across the audio file and record the outputs you therefore learn not only which frequencies are present, but also their locations in the file.

This is the stuff I want to know more about, because I understand my analogies/mental model, but I have only crude/naive 'code mental model' of how these kinds of things are implemented

When you say:

    When you scan this wavelet across the audio file
Is that literal in code?

Apologies for the naive code analogies -- your comment makes me think of something like an XLA 'reduceWindow' operation (edit: or an OpenCL kernel), where a wavelet-sized 'window' is slid over the input, the difference is computed for each data point -- and then (waves hands: probably something smarter than just a) 'cumulative sum' to see how close/different is is.

(a) is that on the right track/what you meant by 'scanning the wavelet across?

(b) what about transforming the wavelet -- to different scales, etc?

This seems like the kind of thing that's actually like a search problem, heh, where you'd need to slide it over the input many many times, and where it could conceivably benefit from (waves hands) machine learning techniques as a result, to come up with something that does the search for a good wavelet encoding more efficiently based on the input.

... which, I guess, means I should just look around github for "wavelet compression" and see what code pops up, since that search for good encodings will be done in any of those.

--- Edit: this one seems simple enough to learn from, it's minimal, MIT licensed, audio-focused, written in Julia, and references the course material that it's based on:

https://github.com/nicholaskl97/wavelets

It, uh ... like lots of scientific code, it has a lot of single-letter variables. :D But if one was to google up a few acronyms ('DWT'), install Julia, get it to run, and then re-implement it in another language, my feeling is that you might understand wavelets pretty good, no? Or be at least be well-equipped to understand more interesting uses of wavelets?

(I'm posting this because I might use some 'shop time' doing approximately that, so I would definitely appreciate any HN comments telling me 'consider learning from $some_other_example instead, for $some_reason.')

I think one of the major advantages of wavelets is the time-frequency representation of the signal, which means being able to see the different spectra of the signal change over time (at different scales) which is not easily attainable with Fourier transform. So you could detect transient interesting patterns (spatial frequencies) in the data that would otherwise go unnoticed. Same you could enhance or remove certain spatial frequencies of the signal depending on the task.