|
|
|
|
|
by petercooper
872 days ago
|
|
It's the multimodal input capability that seems to be of value here – see the transcript at https://2mb.codes/~cmb/ollama-bot/#chat-transcript .. Namely, being able to interrogate images in a verbal fashion, such that someone without sight (or perhaps even someone who just doesn't want to see an image) can get an appreciation for their contents. |
|
The next thing we want to do is obtain some glasses with cameras and wi-fi and send images to ollama from them for real-time description. The benefits are obvious, especially for mobility purposes.