Hacker News new | ask | show | jobs
by eig 762 days ago
A few months ago there were articles going around about how Samsung galaxy phones were upscaling images of the Moon using AI [0]. Essentially, the model was artificially adding landmarks and details based on its training set when the real image quality was too poor to make out details.

Needless to say, AI upscaling as described in this article would be a nightmare for radiologists. 90% of radiology is confirming the absence of disease when image quality is high, and asking for complementary studies when image quality is low. With AI enhanced images that look "normal", how can the radiologist ever say "I can confirm there is no brain bleed" when the computer might be incorrectly adding "normal" details when compensating for poor image quality?

[0] - https://news.ycombinator.com/item?id=35136167

6 comments

The Samsung phone wasn’t a technological advancement, it was sheer fraud.

A camera is supposed to take pictures of what it sees.

Imagine going to a restaurant, ordering French onion soup, and getting a bowl of brown food coloring in water.

> “A camera is supposed to take pictures of what it sees.”

Feels like that’s just a matter of expectations.

A phone used to be a device for voice communications. It’s right there in the Greek etymology, “phonē” for sound. But 95% of what people do today on devices called phones is something else than voice.

Similarly, if people start using cameras more to produce images of things they want rather than what exists in front of the lens, then that’s eventually what a camera will mean. Snapchat thinks of themselves as a camera company, but the images captured within their apps are increasingly synthesized.

(The etymology of “camera” already points to a journey of transformation. A photographic camera isn’t a literal room, as the camera obscura once was.)

Taking this thought to its logical conclusion: https://bjoernkarmann.dk/project/paragraphica
Some of us want a record of what was, not a hallucination of what might have or could have been.

Courts, for example. Forensic science was revolutionized by widespread adoption of photography leading to a reduction of the importance given to witnesses. Who also hallucinate what might have happened.

So when I took an 8 second exposure of the aurora on Friday and then used Capture One to process the raw to make it more vivid than it was in real life - is that a record of what was?

Don’t get me wrong, I’m not super keen on AI type stuff in cameras as a whole. The line is muddy though. A smartphone camera straight up can’t capture the moon well, or at all. If it then looks more like it did in real life after processing is that better or worse than my above example?

How often do you capture auroras or other beauty shots, vs. readouts on your electricity meter, stickers on the back of your furnace, receipts, and a hundred other displays and documents you need to send someone? I definitely do plenty of the latter, and in such cases, I'd really appreciate the AI to not spice things up with details it thinks should be there.

I'm by no means against the feature. Hell, I shoot 90% of my family photos in Portrait mode on my Galaxy phone, which does some creative bluring and probably some other magic[0]. I just really appreciate being able to turn the magic on or off myself. That, and knowing exactly what the magic is[1].

--

[0] - I don't know what exactly it does, but switching from normal to Portrait mode is all it takes for my photos to suddenly look amazing, as judged by my wife, vs. plain sucking.

[1] - See e.g. "scene optimizer" in Galaxy phones. It's a toggle on normal photo mode. I have no first clue what it does, I can't see any obvious immediate difference between shots taken with vs. without that feature.

When I'm taking a picture of a receipt or sticker behind a machine, I don't actually want a literal photograph of the entire scene but just a reproduction of the text content.

Any environmental lighting, color and texture of the desk, and all other visual detail are only a distraction.

So if the camera would recognize this intent and just give me the receipt looking like it came from a scanner, that would in fact be a great improvement. So I think your example is in fact a point in favor of having AI meddle with most photos that people shoot.

My job the last 10 years was a photographer, and I still take a lot of photos of my kids, dog, wife, etc.
I recently took a picture of a lizard on a granite with large grains. When I zoomed in to identify the type of lizard I saw that all the grains and some leaves on a tree had been simplified with some type of swirl. I find it unlikely those swirls were artifacts of the sensor itself. My assumption is the effect is related to compression given how often it repeated but I'm not sure.
> Some of us want a record of what was, not a hallucination of what might have or could have been

Yes, but that doesn’t imply “A camera is supposed to take pictures of what it sees”, only “cameras sometimes are supposed to take pictures of what they see”.

Some of us prefer a nice picture over a more exact record of what was; some of us will even argue that such manipulated pictures are better at capturing what was precisely because they sacrifice some of the physical reality for the non-physical essence of a moments one’s memory of such a moment.

That moon photo is a nice example. Smartphone cameras aren’t very good at capturing what the full moon looks like in our memory.

Pretty much all modern digital cameras are using heuristics and algorithms to construct the image you see - it's not just a sensor grid and a bitmap file and it hasn't been for a long time.
There are differences of kind here.

The important property is how the pixels are correlated with the physical reality being imaged - because the goal is to reason and learn about the depicted subject though information in the photo. Heuristics and algorithms for demosaicing, white balance, auto-brightness/dynamic range, lens collection, removing motion blur, etc. improve the correlation or improve our ability to see it. This is fine, though you need to be aware at time which properties of the image are to be treated as relative vs. absolute.

This is also a far cry from having your camera think, "I'm a consumer camera! Normies often shoot pictures of the Moon, so this fuzzy circle must be it; let me paste a high-resolution photo of the Moon there", or "gee, normies often shoot sportsball, so this green thing must be astroturf, and this grey blob is probably the ball", etc.

Big difference between a fancy interpolation algorithm that compiles to 500 bytes and another that takes many more orders of magnitude of space because it also contains data used to add details from what it thinks other similar photographs have
Fine, I expect an MRI to take pictures of what's actually going on in my body rather than inventing MRI-like images that can fool the radiologist into thinking I'm healthy when I'm not. Not sure why this is controversial.
small correction “phonē” means voice not sound :)
In ancient and koinē Greek it meant both voice and sound (including the sound of instruments, which carries in modern instrument names like "saxophone" and so on).

They would say "ὀργάνων φωναί", "φωνὴ βροντῆς", "φωνὴ ὑδάτων" and so on for example.

> Imagine going to a restaurant, ordering French onion soup, and getting a bowl of brown food coloring in water.

Welcome to England!

I still remember the zoom and enhance joke they played on red dwarf. Parody has become reality.
Immortalized in Super Troopers (2001).

https://youtu.be/KiqkclCJsZs

Also "Space Force" has an "Enhance!" scene:

https://www.youtube.com/watch?v=FVOydVwOO4M

Now that you mention it. I recently picked up a bottle of Red Vinegar with large pictures of red grapes on it. Naturally I assumed this was grape vinegar. How shocking it was to discover that this Chinese company was selling acetic acid mixed with food colors.
Where do you draw the line? RAW, HDR, photo stitching, blur removal?
This is an excellent ponit, and I don't know where to exactly draw the line ("I know it when I see it"). I personally use "auto" (probably heuristic, maybe soon-ish AI-powered) features to adjust levels, color balance etc. Using AI to add things that are _not at all present_ in the original crossed the line into digital art vs photography for me.
I draw the line where the original pixel values are still part of the input. As long as you’re manipulating something that the camera captured, it’s still photography, even if the math isn’t the same for all pixels, or is AI powered.

But IMO it’s a point worth bringing up, most people have no idea how digital photography works and how difficult it is to measure, quantify and interpret the analog signal that comes from a camera sensor to even resemble an image.

There was the small complication of the fact that the moon texture that Samsung got caught putting onto moon-shaped objects in photos is, of course, the same side of the same moon.
> the moon texture that Samsung got caught putting onto moon-shaped objects in photos is, of course, the same side of the same moon.

Probably not exactly the same side and orientation. https://en.wikipedia.org/wiki/Libration#Lunar_libration: “over time, slightly more than half (about 59% in total) of the Moon's surface is seen from Earth due to libration”

Sort of, kind of, but not shot at the same time, and not at the same location.

I would object slightly less if they made a model (3D or AI) that captures the whole side of the Moon in high detail, and used that, combined with precise location and date/time, to guide resolving the blob in camera input into a high-resolution rendering *that matches, with high accuracy and precision, what the camera would actually see if it had better optics and sensor*. It still feels like faking things, but at least the goal would be to match reality as close as possible.

I draw the line at adding something that was not there
None of those are adding data, assuming normal definitions of 'blur removal' and not the AI kind. So with those the line is very easy to draw.
I wouldn't go that far to call it as a fraud, unless you call literally every phone-with-camera manufacturer these days a fraud. Then I agree as my trusty old nikon fullframe always catches only whats there, including noise and instability that modern phones handle easily.

People were commenting on that thread how apple phone ie mirrored only bunny within bigger picture of a bunny in the grass (thats rather hilarious 'bug'), and we all know how apple consistently removes all moles and wrinkles, changes completely skin tone and overall tonality like every single picture looks like its taken in the golden sunset hour. Ie that nasty samsung is much more truthful when it comes to this, including latest flagships.

That's outright lying too, IMHO much worse - moon is tidally locked so showing exactly same side with same features for millions of years, so they were adding details that are there, just impossible to see on non-stabilized tiny plastic lens&sensor combo in the night.

Making somebody 20 years younger, much prettier and changing their overall look on most important feature we humans have, doing it by default without any real option to turn it off, does a lot of long term body-perception damage in young folks.

>Imagine going to a restaurant, ordering French onion soup, and getting a bowl of brown food coloring in water

Isn't that like 80% of the mass food industry and 99% of the fast food industry.

It's kinda like the classic Ebay scam where you buy a picture of the item instead of the item.
Yes, or the increasingly common Amazon one, where you get an AI-generated summary of the book, instead of the actual book.
Wait this is a thing?
> A camera is supposed to take pictures of what it sees.

You wouldn't like the picture of what it sees. The lens is just not big enough. Even the pro raw and other features that phone introduced apply processing.

An MRI machine is a fancy 3D camera. Is this "3D Deep-DSP Model" so different from the processing Samsung did on their phones?
Samsung would replace a white circle with an image of the moon. Even calling it AI was a stretch.
Yeah but that's too hard, and you can just use "AI" to make cool photos instead. Who wants an actual camera when you can have something "like" a camera at half the price?
> A camera is supposed to take pictures of what it sees.

If people wanted cameras to actually take what it sees, then we wouldn't have autofocus, photoshop or instagram filters.

The goal of a cell phone camera is to capture what you are experiencing, not to literally record what light strikes the cmos chip.

> If people wanted cameras to actually take what it sees, then we wouldn't have autofocus,

Bad example. Autofocus makes changes to the light that goes into the camera, not just the data that comes out.

> photoshop or instagram filters

Bad examples. Those both give the user a before-and-after comparison so the user can decide what kind of alterations are reasonable or desirable.

> Bad example. Autofocus makes changes to the light that goes into the camera, not just the data that comes out.

Arguably so do the AI things (i.e. they take multiple shots at different exposure and composit them).

The point is, we have ceeded manual control of photography to automated systems a long time ago. Most people are happy with that choice, as phone cameras serve a cultural function and are not scientific instruments.

When a cell phone camera auto-focuses, it's still literally recording what light strikes the cmos chip.

Do you think its recording some other light?

A) i think that is not true if focus stacking is in use.

B) the light has been modified based on an algorithm, causing the camera to capture different things. Its not just the light as it would be by itself.

In terms of the output of the camera the effect of this is much more significant than the AI stuff being complained about.

A camera takes a picture of what it sees. What comes next is a different thing all together.
> A camera takes a picture of what it sees.

All images taken with digital cameras have been filtered by a pipeline of advanced algorithms. Nobody ever looks at "what the camera sees". What kind of savage would look at an image before demosaicing the Bayer pattern? (Except from the people who work in demosaicing, of course.)

What a goofy point to be raised as many times it as it has been in this thread. All of that stuff serves the purpose of more faithfully emulating actual human vision.
Actual human vision works a lot more like the AI stuff then you think. Human vision is famous for filling in details that aren't there based on what you expect to see.
This is one aspect about machine learning models I keep discussing with non-technical passengers of the AI-hype-train: They are (in their current form) unsitable for applications where correctness is absolutely critical.
I don’t know enough to make absolute statements here, but deep learning models can beat out human experts at discerning between signal and noise. Using that to guess at data and then hand it off to humans gives you the worst of both worlds. Two error probabilities multiplied together. But to simply render a verdict on whether a condition exists I’d trust a proven algorithm.
Yes, pattern recognition is one of the applications ML shines at. Now the question was about using ML to extrapolate between sparse pixels and how much humans can rely on the added detail.

The goal would be to find a way to make ML extrapolate only pixels that really describe actual really present features and never imagining detail that wasn't there in the first place. Now I am no expert at the matter, but what I know of deep learning models they are really good at the latter as they basically make statistic guesses on what would be plausible.

Getting a plausible guess on what looks like a convincing answer works really well for answering a question. But the problem at hand is more like predicting the words someone said based on the first and last word in a sentence. Imagine a criminal case where the evidence is fragmented like that: I am pretty sure a LLM could give a convincing prediction here, but I am not sure how much you could rely on that prediction being reflective of what was actually said. I certainly wouldn't feel comfortable with a conviction the result of that prediction even if it was reflective of the ground truth in 90% of times.

There are a lot of models that are simply good at that without hallucinating nonsense. LLMs are a specific thing with their own tradeoffs and goals. If you have a ML model that says how much does this microscope photo look like an anomaly in this persons blood on a scale from 0-100 it can certainly do better than a human.
As long as AI makes things better on average, it's useful. It doesn't have to be 100% correct.
So if an AI fantasized your face into the extrapolated pixels of the evidence for a documented murder case you would be happy with the conviction, because on average it might be somewhat correct?

I don't wanna hurt anybodies feelings by stating that AI isn't a magical wand that makes everything better — but every technology has use cases at which it excels (e.g. pattern recognition) and use cases for which it is fundamentally unsuitable. If you try to screw on a nut using a hammer, that doesn't mean hammers suck, it means the user has a wrong idea what a hammer is capable of.

The point is: Don't be that person if you can avoid it.

There are applications - such as finding out whether you have a tumor or not - when "improving on average while ignoring outliers" is not acceptable.
This is not true, but it is a major challenge. See https://www.pathai.com/
That is why I said "in its current form".
The state of the art MRI stuff uses "compressed sensing" -- essentially image completion in some domain or another. Presumably, carefully designed to not hallucinate details or one would hope.

There isn't necessarily a particularly neutral choice here: the MRI scan isn't in the pixel domain, artifacts are going to be 'weird' looking-- e.g. edges that move during the scan ringing across the whole image.

Compressed sensing is far more mathematically rigorous.
I don't think we know what's in the black box here. It could be an equivalent relatively unopinionated regularizer ("the pixel domain will be locally smooth, to the extent it has edges they're spatially contiguous") or it could be "just look up the most similar image from a library and present that instead" or anywhere in between. :)
They specifically said they use deep learning which implies a sizeable neural network.
But they're using it to eliminate stray or environmental EMI from the RF signals. That might not create fake stuff at the voxel level. Depends on the specifics.
They've already made this mistake. There was one model that detected skin cancer because there was always a ruler in the images.
So we’re not that much further than the neural networks urban legend from the early 90s.

https://gwern.net/tank

the future is already here, GE has put this into production. if you have remove the onerous constraint of being correct you can make some really crispy images! 9/10 radiologists. that was literally what the FDA approval process was, surveyed a bunch of radiologists to see what they preferred. no adults in the room.

heads should roll etc

Do you have a link or anything? I'm highly interested but unable to find more on this
The interesting, indeed concerning thing, is that problem is not only applied to medical machines and mobile phones but to zillions of daily used wearable devices such as smart watches, brain eeg (e.g. Muse), and others without adverting users that what they see (e.g. HRV) couldn't be interpreted easily by a computer program.

Not saying that we humans are always better but saying that we are believing in number and conclusions from apps created as-is.