| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by TheEaterOfSouls 942 days ago
	I'm blind and I use a 13B LLaVA model locally. I haven't checked with a sighted person about how accurate the image descriptions are, but it seems to generally do okay (described some recent vacation photos pretty well, except sometimes it would list objects that I'm pretty sure aren't actually there in the images). Haven't tried GPT via Be My Eyes because I prefer using my laptop over my phone, but I imagine it'd be a lot better. For now I make regular use of the local model with a shell alias when I want something described, even solved a captcha that I couldn't OCR with it the other day. So yeah, this is one application of ML I'm really excited about, the other being Whisper (speech transcription), because I have profound hearing loss and can use it to transcribe things I can't hear in the audio.