| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by golol 640 days ago
	This is a prompt I gave to o1-mini a while ago: My instructions follow now. The scripts which I provided you work perfectly fine. I want you to perform a change though. The image_data.pkl and faiss_index.bin are two databases consisting of rows, one for each image, in the end, right? My problem is that there are many duplicates: images with different names but the same content. I want you to write a script which for each row, i.e. each image, opens the image in python and computes the average expected color and the average variation of color, for each of the colors red, green and blue, and over "random" over all the pixels. Make sure that this procedure is normalized with respect to the resolution. Then once this list of "defining features" is obtained, we can compute the pairwise difference. If two images have less than 1% variation in both expectation and variation, then we consider them to be identical. in this case, delete those rows/images, except for one of course, from the .pkl and the .bin I mentioned in the beginning. Write a log file at the end which lists the filenames of identical images. It wrote the script, I ran it and it worked. I had it write another script which displays the found duplicate groups so I could see at a glance that the script had indeed worked. And for you this does not constitute any understanding? Yes it is assembling pieces of code or algorithmic procedures which it has memorized. But in this way it creates a script tailored to my wishes. The key is that it has to understand my intent.

2 comments

globnomulous 638 days ago

Does "it understands" just mean "it gave me what I wanted?" If so, I think it's clear that that just isn't understanding.

Understanding is something a being has or does. And understanding isn't always correct. I'm capable of understanding. My calculator isn't. When my calculator returns a correct answer, we don't say it understood me -- or that it understands anything. And when we say I'm wrong, we mean something different from what we mean when we say a calculator is wrong.

When I say LLMs can't understand, I'm saying they're no different, in this respect, from a calculator, WinZip when it unzips an archive, or a binary search algorithm when you invoke a binary-search function. The LLM, the device, the program, and the function boil down (or can) to the same primitives and the same instruction set. So if LLMs have understanding, then necessarily so do a calculator, WinZip, and a binary-search algorithm. But they don't. Or rather we have no reason to suppose they do.

If "it understands" is just shorthand for "the statistical model and program were designed and tuned in such a way that my input produced the desired output," then "understand" is, again, just unarguably the wrong word, even as shorthand. And this kind of shorthand is dangerous, because over and over I see that it stops being shorthand and becomes literal.

LLMs are basically autocorrect on steroids. We have no reason to think they understand you or your intent any more than your cell phone keyboard does when it guesses the next character or word.

link

globnomulous 638 days ago

When I look at an image of a dog on my computer screen, I don't think that there's an actual dog anywhere in my computer. Saying that these models "understand" because we like their output is, to me, no different from saying that there is, in fact, a real, actual dog.

"It looks like understanding" just isn't sufficient for us to conclude "it understands."

link

marco_craveiro 640 days ago

I think the problem is our traditional notions of "understanding" and "intelligence" fail us. I don't think we understand what we mean by "understanding". Whatever the LLM is doing inside, it's far removed from what a human would do. But on the face of it, from an external perspective, it has many of the same useful properties as if done by a human. And the LLM's outputs seem to be converging closer and closer to what a human would do, even though there is still a large gap. I suggest the focus here shouldn't be so much on what the LLM can't do but the speed at which it is becoming better at doing things.

link

golol 640 days ago

I think there is only one thing we should focus on: Measurable capability on tasks. Understanding, memorization, reasoning etc. are all just shorthands we use to quickly convey an idea of a capability on a kind of task. Measurable capability on tasks can also attempt do describe mechanistically how the model works, but that is very difficult. This is where you would try to describe your sense of "understanding" rigorously. To keep it simple for example, I think when you say that the LLM does not understand what you must really mean is that you reckon its performance will quickly decay off as the task gets more difficult in various dimensions: Depth/complexity, Verifiability of the result, length/duration/context size, to a degree where it is still far from being able to act as a labor-delivering agent.

link