| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by burcs 959 days ago
	This is amazing, I feel like these vision models are going to make everything so much more accessible. Between the Be My Eyes app integration and now this, I'm really excited for how this transforms the web.

1 comments

ctoth 959 days ago

I agree, and I think we're a year or two away from a full end-to-end trained screen reader. The ground truth from existing systems would provide great training material.

As a technical blind person, my only concern is the inherent loss of privacy while sharing stuff with the big models.

link

supriyo-biswas 959 days ago

There are open source models such as https://github.com/THUDM/CogVLM and https://github.com/haotian-liu/LLaVA.

link