Title has a misleading domain name (gwern.net). Link is to a PhD thesis titled "Scaling Laws for Deep Learning" by Jonathan Rosenfeld. Not sure why wasn't linked more directly,
Gwern, have you considered hosting your archived docs on a different subdomain (e.g., doc.gwern.net) to make it clearer that they are not something you have authored yourself? Not sure what the best subdomain would be though.
I don't think that would make it any clearer. Why would 'doc.gwern.net' be more obviously just a random document than 'gwern.net/doc/www/'?
Regardless, I am puzzled how OP got this URL in the first place. He wasn't supposed to, he was supposed to get the canonical Arxiv PDF link. Because this is one of the cache mirrors/local archives†, rather than a regular hosted document. We block everything in /doc/www/ in robots.txt & HTTP no-archive/crawl/mirror/etc headers, and we use JS to swap out the local URL for the original URL whenever the reader clicks or mouse-overs or interacts with a link to the URL in a web page (and that is the only place they should be publicly listed or accessible). If OP read it on gwern.net by seeing a link to it, and he wanted to copy the URL elsewhere, he should have just gotten the canonical "https://arxiv.org/pdf/2108.07686#page=85"... But somehow he didn't.
OP, do you remember how exactly you grabbed this URL? Is this an old link from before our URL swapping was implemented, or did you deliberately work around it, or did you find some place we forgot to swap, or what?
(If anyone is wondering why I mirror Arxiv PDFs like this in the first place: it's for the PDF preview feature in the popups. Because Arxiv blocks itself from being loaded in iframes we need local mirrors for PDF preview to work at all; local mirrors save a new domain lookup and speeds up the PDF preview a lot because we compress the PDF more thoroughly and Arxiv servers are always overloaded; and because readers can potentially pop up many Arxiv PDFs easily, it saves Arxiv a lot of bandwidth and avoid burdening their servers further, so it's just the responsible thing to do.)
Yes, without the swapping JS, you wouldn't get the canonical URL. But browsing Gwern.net these days without JS is pretty painful. And in this particular case, there is only one place on Gwern.net that the link exists where you could see it without JS; in the other 5 or 6 links, you could only get there via JS and thus the swapping should've happened. So it is not a safe assumption that OP simply browsed with NoScript.
Hi Gwern, I'm honestly not sure. I have some firefox extension that skips trackers and other redirects. I have like 100 firefox extensions, actually. I'm not sure how most of them work nor what they do exactly, I just trust that they make my browser more "secure" and I tend to download things at random -- especially if I see ads or want certain features in my client (i.e. a browser that auto-rejects cookies).
Happy to try and help you figure this out but when I revisit this specific hyperlink I'm still getting the gwern url & not arxiv
> Why would 'doc.gwern.net' be more obviously just a random document than 'gwern.net/doc/www/'?
HN only shows the domain next to the title. So now when browsing the front page we only see gwern.net as the source of the doc and initially assume it's some work from you.
I don't think HN shows third-level domains, so the point is moot. There may be exceptions for web services that lend out subdomains like Github[1], but doc.gwern.net would probably still show as gwern.net[2]. If you're willing to see the URL in the browser statusbar or addressbar, then the URL path makes very clear that the actual source is arxiv.org.
I think the basic premise of this paper is wrong. Very few natural signals are bandlimited - if images were, they would be no need to store in high resolution, you could just upsample. Natural spectra tend to be pink (decaying ~3dB/octave), which can be explained by the fractal nature of our world (zoom in on details and you find more detail).
Of course that says that our eyes (& more generally our sensory organs) are bandlimited which is what lossy signal compression algorithms exploit (similar to how MP3 throws away acoustic signals we can't hear or how even "lossless" is still only recorded at 44 kHz). And indeed any sensor has this problem and it's a physical limitation (e.g. there's only so much resolving power an optical sensor of a certain size can have for an object of a certain distance away which is why we can't see microscopic things and this is a limit from the physics of optics)
It says nothing about the underlying signal in nature. But of course we're building LLMs to interact with humans rather than to learn about signals in the true natural world that we might miss.
That applies to individual samples. The eye gets around this by saccading (rapid movements) to get multiple samples. Also, you interact with your environment rather than passively sampling it, so if you want to look closer at something you can just do that.
Images aren't truly bandlimited because they contain sharp edges; if they were bandlimited you'd be happy to see an image upscaled with a Gaussian kernel, but instead it's obviously super blurry.
When we see an edge in a smaller image we "know" it's actually infinitely sharp. Another way to say this is that a single image of two people is fundamentally two "things", but we treat it as one unified "thing" mathematically. If all images came with segmentation data then we could do something smarter.
"In optics, any optical instrument or system – a microscope, telescope, or camera – has a principal limit to its resolution due to the physics of diffraction." This might be what wbl is referring to.
You've misunderstood something about Nyquist. A sample rate of, say, 44KHz, will capture ALL information below 22KHz and recreate it perfectly.
There are of course implementation details to consider, for example you probably want to have a steep filter so you don't wind up with aliasing artifacts from content above 22KHz. However it's important to understand: Nyquist isn't an approximation. If your signal is below one half the sample rate, it will be recreated with no signal lost.
I don't recall seeing Nyquist described with those requirements before. I think it is evident that in the real world, there are many practical signals which do not exactly meet those requirements, but which still yield nearly-exact reproduction.
I wonder, what are some examples of signals that fail to reproduce after sampling in a way that is "nearly Nyquist"?
If you look at the Wikipedia entry on the Nyquist Sampling Theorem, you should note that the summations to reconstruct the original signal go from negative infinity to positive infinity. In other words, that sum requires an infinite number of samples.
There are many signals of practical interest that can be approximately reconstructed with a finite truncation of the series. Note, however, that any signal that has only a finite length, eg has a uniformly zero amplitude after some time t_final, does not have a finite bandwidth, and cannot be exactly reconstructed by any sampling scheme. This is the case whenever you stop sampling a signal, eg it is always the case whenever you step outside the mathematical abstraction and start running real code on a real computer. So any signal reconstructed from samples is always approximate, except for some relatively trivial special cases.
Hm, yes, a function cannot have bounded support in both the time domain and the frequency domain…
What if you take a function that has bounded support in the time domain, and then turn it into a periodic function? Might the resulting function have bounded support in the frequency domain even though the original function did not?
I suppose doing this would force the Fourier transform to have discrete support? But under what conditions would it have bounded support?…
I guess technically a low-pass filter applied to a signal with finite support in the time domain, would result in a function which has infinite support in the time domain.
I suppose sinc(f t + c) doesn’t have bounded support, and it is unsurprising that a non-trivial linear combination of finitely many terms of this form would also not have finite support.
Still, such a linear combination could decay rather quickly, I imagine. (Idk if asymptotically faster than (1/t) , but (1/(f t)) is still pretty fast I think, for large f.)
Soon enough the decay should be enough that the amplitude should be smaller than the smallest that the speaker hardware is capable of producing, I suppose.
I think it is you who have misunderstood the Nyquist-Shannon theorem. Aliasing and noise are real concerns. Tim Wescott explains it very well [0] (Figures 3, 10 and 11). If your signal is below one half the sample rate but the noise isn't, you'll lose information about the signal. If your signal phase is shifted wrt. the sampling, you'll lose information. If your sampling period isn't representative, you'll lose information. These are not implementation details.
Anything close to N/2 is going to have varying magnitude that requires filtering and likely oversampling to remove.
How close to the Nyquist bandwidth you can get depends upon the quality of your filtering.
44.1KHz is a reasonable compromise for a 20KHz passband. 48KHz is arguably better now that bits are cheap-- get a sliver more than 20KHz and be less demanding on your filter. Garbage has to be way up above 28KHz before it starts to fold over into the audible region, too.
> Garbage has to be way up above 28KHz before it starts to fold over into the audible region, too.
You brick-wall everything at 20 kHz (with an analogue filter) before you sample it; that's part of the CD standard, and generally also what all other digital CD-quality audio assumes. This ensures there simply is no 28 kHz garbage to fold. The stuff between 20 and 28 in your reconstructed signal then is a known-silent guard band, where your filter is free to do whatever it wants—which in turn means that you can design it only for maximum flatness (and ideally, zero phase) below 20 kHz and maximum dampening above 28 kHz (where you will be seeing the start of your signal's mirror image after digital-to-audio conversion), not worrying about the 20–28 kHz region.
No, 44 kHz is because you want to reconstruct the (20 kHz) bandlimited signal and it's (much) easier to realize such a filter if you have a bit of a transition band.
> You've misunderstood something about Nyquist. A sample rate of, say, 44KHz, will capture ALL information below 22KHz and recreate it perfectly.
Let's do a thought experiment. Imagine a digital image where the pixels are the exact minimum size that you can see.
If a line is exactly 1-pixel-wide, it'll display perfectly when it aligns perfectly with the pixels.
But, if the 1-pixel-wide image doesn't align with the pixels, what happens?
You can see this in practice when you have a large screen TV, and watch lower-resolution video. Smooth gradients look fine, but narrow lines have artifacts. IE, I recently saw a 1024p movie in the theater and saw pixels occasionally.
The same thing happens in sound, but because a lot of us have trouble hearing high frequencies, we don't miss it as much.
Wasn't there an paper on band limiting generative CNNs, that fixed texture pinning? Basically by blurring the results of the kernel with neighbors, you get rid of all this aliasing?
https://arxiv.org/abs/2108.07686
https://arxiv.org/pdf/2108.07686#page=85