| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jpsouth 872 days ago
	This is so cool. I’d ask how it works, however I feel like I wouldn’t understand at a fundamental level, even if I read through your codebase. Interpreting an image in the concept of a machine baffles me, it doesn’t have eyes. It surely can’t sense light like humans can. It can’t possibly understand depth (the sofa is in the far left background?!). It can’t know what a goatee is, based on some pixels that are mildly different colours than the skin or background. These are all assumptions I’ve made coming into this, and I am relatively sure I’m wrong at this stage. If you’d like to briefly post I’m sure a lot of HN denizens would appreciate it however. I’ll just stand at the sidelines, post this and spectate the commentary and try it myself with a small group.

3 comments

blindgeek 872 days ago

To be completely honest, I don't really know what I'm doing. The IRC bot I wrote isn't complicated at all; it basically just acts as a bridge between IRC and a program that has an HTTP API. FWIW I've never written an IRC bot before, so this is "baby's first bot". I also wrote it in Go, even though I'm not a Go programmer. Probably all of that shines through in the code.

The real magic happens in [ollama](https://ollama.ai/), which lets you run LMMs locally.

link

justsomehnguy 871 days ago

> Interpreting an image in the concept of a machine baffles me, it doesn’t have eyes

Your mistake here is thinking what machine has understanding of anything. It doesn't. But if you know how human learning works, what is a compression and what is a lossy compression then it is quite easy to understand.

Machine is fed with tons of images with word references what is in the image. Then it finds what is similar in the images of a similar objects, ie works just like a compression algo, except it doesn't store the exact matches but relationships of some markers it finds in the images. That's why it doesn't and doesn't need to understand where is sofa and what is a sofa, it just have a relationship between something what has a relationship to the word 'sofa' and relationship with something what we, human describe as 'position'.

link

carom 871 days ago

Have you tried ChatGPT yet? It can describe images quite well.

link

rolltrunhert 871 days ago

It doesn't quite fit the bill of running on their own hardware

link